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INTRODUCTORY. 


Let me preface this paper with a brief quotation from West- 
ergaard’s treatise on “Scope and Method of Statistics,”” which 
appeared in the QUARTERLY PusLicaTions of this Association 
last June. ‘‘A prominent feature of modern statistics is the 
immense development of the field. . . . In the future 
the difficulty will be not so much in gathering material as in 
mastering it, in digesting all these masses of reports which have 
been stored in the archives and on the book-shelves of statis- 
tical offices.” 

In the future the difficulty will be not so much in gathering 
material as in mastering and digesting it. These words strike 
a definite keynote in regard to the modern problem of statis- 
tics. A good digestive apparatus thoroughly utilizes the 
material offered to it, discarding what is unnecessary or ill- 
adapted and reconstructing what is useful into the form best 
suited to the general organism of which it is afunction. Simi- 
larly, the statistician of today must utilize the material at his 
disposal first by discarding the useless, the irrelevant, and 
the defective, and then by rearranging the remainder into the 
form and substance best suited to the needs of the organization 
with which his interests are bound up. 

Statistics are now collected and compiled on every conceiv- 
able subject under the broad heavens. This activity is carried 


* Paper presented at the annua! meeting of the American Statistica] Association, Celumbus, Ohio 
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on in large part by means of governmental, but in part also 
by private or voluntary organizations. The principal problem 
of today is to utilize the great mass of material already avail- 
able, together with what is constantly being added, so as to 
make it of the greatest service to society. 

Restricting ourselves to the question of statistics as utilized 
in the modern business world, we may ask: First, what or- 
ganizations collect statistics of interest to business; Second, 
what use is actually made of statistics in business practice. 

By “business” I mean in a general way all forms of effort 
put forth by modern society primarily and directly for eco- 
nomic gain. This definition excludes such activities as are 
devoted to general informational, political, educational, and 
scientific purposes, even though many of these activities may 
be, and some undoubtedly are, of indirect service to business. 
In other words, I shall confine myself strictly to the direct 
and conscious efforts of the business world to secure and 
utilize statistical information for its own economic and finan- 
cial benefit. Even thus restricted, the field presents a much 
greater extent than could possibly be covered within the limits 
of a brief paper, and I shall attempt little more than to indicate 
the several lines along which development seems to be tend- 
ing most rapidly today. 


ORGANIZATIONS THAT COLLECT BUSINESS STATISTICS. 


It is clear that the primary sources of information of service 
to the business world are those afforded by governmental 
agencies; in especial, the agencies of the federal government. 
Without going into detail, it may be said that the Depart- 
ments of Commerce, Agriculture, and the Treasury, and the 
Interstate Commerce Commission, are the principal federal 
agencies compiling business statistics, although we must not 
overlook the Department of Labor, various independent 
boards such as the Federal Reserve Board, Federal Trade Com- 
mission, Farm Loan Board, the newly created shipping and 
tariff boards, and the like. My effort here, however, is to 
indicate the compilation activities of private rather than of 
governmental or public agencies. The number of private 
agencies is legion, and can be discussed briefly only by gen- 
eral grouping. 
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First, we have organizations maintained individually or 
codperatively by industrial corporations, whose investiga- 
tions and compilations are designed to cover the whole of the 
industry in which the particular corporations are engaged. 
Perhaps the oldest of such organizations now in existence is the 
American Iron and Steel Institute, which was established in 
1855, under a slightly different name, and has published. 
statistical reports of the iron and steel industry ever since. 
One clause of the constitution adopted in 1855 read as follows: 
“The general objects of this association shall be to procure 
regularly the statistics of the trade both at home and abroad.” 
This clause was retained in the constitution when the 
American Iron and Steel Institute replaced the older associa- 
tion, and may be found there today. In pursuance of this 
constitutional provision for the collection of statistics of the 
industry, the secretary of the association compiled in 1859 
a guide to the “iron works and iron ore mines of the United 
States.’”’ This was the only available compilation of data 
regarding American furnaces, rolling mills, forges, and steel 
works until 1879, when the association published the first 
edition of its directory to the iron and steel works of the 
United States, which has been reissued from time to time, 
down to the present year. The American Iron and Steel 
Institute in 1913 established a statistical bureau, which 
issues annual statistical reports covering the production, 
imports and exports, and prices of iron and steel, iron ore, 
coal and coke. In addition, the bureau issues bulletins from 
time to time giving current statistics of production of pig iron, 
steel ingots and castings, steel rails, nails, pipe, plates, etc. 

I have described the statistical work of the Iron and Steel 
Institute in some detail, both because it is probably the oldest 
existing organization of the kind, and because it covers the 
field represented by its industry in an unusually complete and 
satisfactory manner. Codperative statistical bureaus or 
organizations of some kind are maintained in a number of 
other industries, besides that of iron and steel manufacture. 
The form, organization, and title of these bureaus vary consid- 
erably from industry to industry, but they have this one 
feature in common: they collect and compile statistics dealing 
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with their respective industries. Such bureaus or offices 
exist in the electric railway industry; among life insurance 
corporations; and in the steam railway industry. In addi- 
tion, a number of national associations representing particular 
industries have among their aims the collection and dissemi- 
nation of statistical data regarding their respective activities. 
In especial, I would mention the American Forestry Associa- 
tion, which compiles forest statistics; the United States 
Brewers’ Association, which publishes a yearly handbook on 
the status of the trade; the American Foundrymen’s Associa- 
tion, organized in part to collect all proper information con- 
nected with the foundry business; the American Mining 
Congress, which strives to disseminate information in relation 
to mining, metallurgy, and allied industries; the Copper Pro- 
ducers’ Association, which compiles copper statistics;. the 
American Water Works Association, the National Association 
of Cotton Manufacturers’ which collects and imparts infor- 
mation relating to the cotton industry; the Street Railway 
Association of the State of New York, organized for the ac- 
quisition and dissemination of experimental, statistical, and 
scientific knowledge relating to the construction, equipment, 
and operation of street railways. 

Among the statistical offices created by and among the 
steam railways are two committees of the American Railway 
Association, one on relations between railroads and the other 
on accident statistics. The first of these committees was 
created in 1907 and has been publishing statistics of freight 
car surpluses and shortages ever since that date; also, a part 
of the time, statistics of freight car performance. The other 
committee was created in 1915 for the purpose of making 
suggestions for the improvement of the methods of reporting 
and compiling statistics of railway accidents. Other statis- 
tical bureaus organized by the railways have been the Special 
Committee on the Relation of Railway Operations to Legis- 
lation, which has been in existence since 1909 and publishes 
statistics of laws introduced and passed by state legislatures 
each year relating to railway operations; the Bureau of Rail- 
way News and Statistics, maintained in Chicago by a group 
of railways, which has been in existence since 1904 and which 
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compiles annually and periodically statistics of various aspects 
of the railway question; the Bureau of Railway Economics, 
organized at Washington in 1910 by railways of the United 
States, whose work is largely statistical, consisting of various 
publications on railway revenues and expenses, traffic, stock- 
holders, comparative railway statistics of the United States 
and foreign countries, wages, equipment, etc. This bureau 
also compiles statistical exhibits for rate cases, wage arbitra- 
tions, and the like, and prepares statistical material on various 
topics as the result of special inquiries. 

A number of large coéperative bodies of business men also 
maintain organizations for the compilation and collection, 
or in some cases the improvement, of business statistics. In 
this class is the Committee on Statistics and Standards of 
the Chamber of Commerce of the United States recently 
created. This national federation of local bodies of business 
men—the United States Chamber of Commerce—was organ- 
ized in April, 1912. Among other functions it utilizes the 
commercial data gathered by government bureaus by direct- 
ing it into the channels to which it is immediately applicable, 
and analyzes statistics with regard to the production and 
distribution of manufactures at home and abroad. In other 
words, the chamber digests and adapts existing business 
information to the needs of the business bodies constituent 
to it. 

In the second place, we have statistical organizations 
established and maintained by individual corporations. 
There are so many of these that I can allude only briefly to a 
few, which may be regarded as representative of the general 
situation. First of all may be mentioned the statistical organ- 
izations of various brokerage and financial firms, including 
the larger banks of the country. An excellent example of 
statistical organizations maintained by banks is that of the 
National City Bank of New York. The statistical work of 
that bank consists primarily of the preparation of statistical 
statements in response to calls from officers of the bank re- 
garding commercial, financial, and industrial conditions in 
the United States and all other sections of the world. In 
addition, there is prepared each week a statistical analysis 
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of the import and export trade of the port of New York. 
While these statements are prepared primarily for the officers 
of the bank, they are also supplied freely, in the interest of 
foreign commerce, to any one making application for them. 

Other large banks have statistical departments, as do nearly 
all of the important banking and brokerage houses in the 
large cities. Speaking generally, these statistical departments 
keep track of occurrences and possibilities in the industrial 
and financial field, and present their findings in general or 
special reports. 

Many manufacturing and industrial firms have departments 
that compile statistical information along their particular lines. 
Among organizations which have done this are banks; broker- 
age and banking houses; local boards of trade, chambers of com- 
merce, stock and produce exchanges, etc.; life and fire insur- 
ance companies; telegraph and telephone companies; makers 
of tires and tire fabrics; locomotive manufacturers; mercantile 
establishments purveying various lines of goods; construction 
companies; makers of bank note paper; hardware and brass 
manufacturers; gas, electric light, heat, and power companies; 
cotton and silk manufacturers; street and interurban railways; 
pottery makers; manufacturers of farm implements; tobacco 
manufacturers; makers of storage batteries and general 
electric equipment; research laboratories; manufacturers of 
soda ash; munitions makers; paper pulp makers; taxicab 
companies. This is only a partial list, chosen more or less 
at random; yet the diversity of interests represented in the 
list clearly indicates the extent to which the modern business 
world is utilizing statistics. 

Prominent among individual corporations that have es- 
tablished statistical organizations are the steam railways of 
the United States. Hardly one of the larger railways but 
maintains a statistical office of some kind, either in connection 
with its accounting department or as an entirely separate 
branch. These statistical offices usually prepare the annual 
and other periodical reports required by various govermental 
agencies; many of them compile much additional information 
for the executives in charge of their operations. Some of 
the large roads carry this statistical work to a very high 
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degree of detail and completeness. From these the list shades 
down to roads which have no statistical force at all, so called, 
yet are compelled by force of law or circumstances to do a 
certain amount of statistical work. 

A third type of private organization engaged in compiling 
business statistics consists of the group established primarily 
for the purpose of collecting and distributing, in digested 
form, the fundamental statistics of the business or financial 
world. I refer to the statistical or ecomonic services that 
prepare statistical charts, barometric diagrams, and the like, 
for the information of their clients, who are usually business 
men with a keen interest in industrial possibilities. Business 
men who subscribe to these services are often of the class 
who can not afford to maintain a separate statistical organ- 
ization, or even to keep a statistician on whole or part time, 
but who can afford to pay a reasonable annual fee for such 
material as is prepared by the services. 

One organization of this type has been doing business since 
1902, and another since 1906, although the larger development 
of this kind of service has been comparatively recent. If I 
understand the underlying principle of this class of services 
aright, it is that business conditions move in cycles; that these 
cycles are not of uniform length or duration; yet that there 
are certain positive signs of the progressive development of 
each cycle, which can be observed only by means of a careful 
charting of the business and financial situation; that by ac- 
curate reading of these signs warning can be given of coming 
events—whether good or bad. 

Perhaps a brief description of the kind of work done by 
organizations of this kind will give the clearest impression 
of their nature, and the significance of their statistical work. 
One service publishes periodical reports designed to ‘“‘give 
merchants, bankers, and investors carefully collected and con- 
densed facts about business and finance.” Among these 
reports are a so-called ‘weekly barometer letter,”’ analyzing 
and forecasting fundamental probabilities in business, and 
indicating the position of business and finance in the current 
cycle; a monthly desk sheet, giving basic statistics as to new 
building, bank clearings, railway operations, crops, etc., de- 
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signed to indicate the fundamental condition of American 
business; a periodical report on mercantile and commodity 
price conditions. 

Another service, using practically its own words to describe 
its activities, brings together, in a barometer chart, composite 
cycles indicating the actual condition of business and finance 
and forming a basis for forecasting any impending change; 
serves as a clearing house for a great mass of statistical data, 
some of which is fundamental and some only of record signif- 
icance, but all of which can be and is applied to solve the 
problems of business men, bankers, private investors, and 
the like; compiles and analyzes special statistics for separate 
industries. 

Still another service is called an ‘‘investors’ service,’’ which 
supplies to clients a number of periodical statistical statements 
on financial conditions, investments, general business, rail- 
ways and industrial corporations, and the like. This service 
also issues a monthly business barometer, based on the belief 
that “all things move more or less in cycles, and to under- 
stand these cycles involves a continuous study and analysis 
of fundamental business conditions.” 

I have devoted what may seem like a disproportionate 
amount of attention to the work of these statistical services 
or clearing houses, not because I am in any way interested 
in their welfare, but because they seem to be making an effort, 
at least, to live up to the Westergaard motto with which we 
opened this paper. In other words, they are taking a great 
mass of statistical material already available, gathered by 
other persons or organizations, and are endeavoring to master 
and digest it, so as to be of the greatest service to the greatest 
number. The work of all organizations of this sort is still 
necessarily somewhat in the pioneer stage. Part of their 
future development will doubtless come from the internal 
growth and improvement of the organizations themselves, 
and part from a greater willingness of industrial concerns 
to furnish figures long regarded as trade secrets. However 
it may come, development of work of this kind appeals to me 
as being distinctly along the line of making business statistics 
really serviceable to society. 
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In addition “to services of the kind I have described, there 
are many organizations and individuals engaged in the same 
sort of work, although on a smaller scale. Some men hold 
themselves out as expert compilers or statisticians, consulting 
statistical experts, and the like. In much of this work there 
is unquestionably room for improvement; but there is definite 
promise for the future in the development of all organizations 
designed to make statistics serviceable to the business world. 


UTILIZATION OF STATISTICS. 


We come now to the second branch of our inquiry. The 
first dealt with the growth of organizations for the compilation 
of business statistics. The second will be devoted to the 
use actually made of the statistics when compiled. 

It is clear, almost without reflection, that a business man or 
corporation will use statistics to guide him in his business 
decisions; to tell him when to expand his activities and when 
to contract them; to inform him in what sections to look for 
the safest and quickest results from a sales campaign; to help 
him decide as to the advisability of a particular proposition; 
in fact, to guide him daily in the various decisions that are 
constantly calling for attention. 

Further reflection will make the possibilities along these 
lines stand out even more definitely. Suppose we illustrate 
these possibilities by a few concrete examples from the business 
world of today. Take for example, corporations maintaining 
a national chain of stores, such as the United Cigar Stores, 
the F. W. Woolworth Company, or any other of similar nature. 
It is clear that these corporations must keep in close touch with 
statistics of supply and especially of demand, both by means 
of reports secured direct from their constituent stores and by 
observation of general market and purchasing conditions. 
Going a step further, the newly organized American Interna- 
tional Corporation must now have, or must certainly establish, 
a statistical organization covering not only the situation in 
the United States but also the whole civilized world as well. 
Again, every bank and banking house must keep in touch 
with business and financial conditions and possibilities by 
statistical means. Whether banks maintain a statistical 
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department or not, they are increasingly forced*to keep track 
of the future by means of statistics. 

Similarly, all business and mercantile houses keep an eye 
to the future by watching the business barometer for their 
particular line of activity. A certain hardware manufacturing 
corporation of the middle west, for example, compiles statis- 
tics of crop and farm conditions, as a clue to the purchasing 
power of the agricultural districts. Trade statistics, banking 
and financial statistics, agricultural statistics, and the like are 
all a part of the equipment of modern business houses. 

Perhaps the greatest development of the utilization of 
business statistics on a large scale is in the railway industry. 
One reason for this is the immensity of the industry, and the 
complexity of its functions. Another is the character of the 
railway organization, the necessity of its activities being 
spread over a wide area, and the difficulty of keeping in touch 
with all parts of the system without detailed statistical reports. 
Although the manager of a large industrial plant keeps in daily 
personal contact with his departmental and division chiefs, 
a railway general manager has his territory spread over thou- 
sands of square miles of area, and can keep in touch with his 
plant only by a system of daily reports. Hence arises the 
detailed reporting methods in vogue on all railway systems, 
whereby the executives receive daily statistical reports from 
their subordinates, covering all features of current operations. 

A friend of mine tells with amusement of his conversation 
with a railway official. The official, in describing the work 
done by the various men in his organization, said the division 
superintendent’s duties consisted in reporting to the general 
superintendent, the duty of the general superintendent was 
to report to the general manager, the general manager reported 
to the vice-president in charge of operations, and the vice- 
president reported to the president. ‘‘And to whom does 
the president report,’’ asked my friend. ‘He reports to the 
board of directors,’’ was the reply. On the face of it, these 
railway officials expended the whole of their energies merely 
preparing and presenting reports to their chiefs next in com- 
mand. As a matter of fact, the incident has significance in 
showing how thoroughly the system of accurate records is 
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woven into the warp and woof of the railway business, and to 
a considerable extent of other industrial activity. No official 
could make an intelligent report to his superior were he not 
himself conversant with every phase of his work; and if his 
own work is going wrong at any point, he is not likely to turn 
in a report until the matter is rectified. In other words, the 
necessity of reporting daily the status of your work keeps you 
alert at all times regarding its progress and its effectiveness 


SUMMARY 


We have seen to how great an extent has developed the 
practice, on the part of industrial and financial organizations, 
of establishing statistical offices with the primary object of 
compiling business statistics.of utility in their work. We 
have seen, further, that these statistics are put to use in num- 
erous ways, but chiefly as guideposts along the path of indus- 
trial development and progress. These guideposts may 
serve as warnings in time of impending stress and storm, or 
as incentives to development during periods of expansion and 
optimism. We have noted the fact that the more complex 
the industry, the more serviceable is a system of statistics 
detailing its activities. Especially is this true of the railway 
industry, with its operations spread out over so vast an area. 

Much the greater part of this statistical development has 
gone on during the past ten years. No one who has been 
working in the field of business statistics during recent years 
can fail to have been impressed by the tremendous growth in 
the use of statistics in the industrial world. The statistical 
services I have described date back hardly more than ten years; 
they have come into being as the result of a definite need that 
has only recently been asserting itself. 

Take one phase of statistical growth in the railway world. 
The use of statistics in a wage arbitration was hardly thought 
of ten years ago. In 1910, in a case involving the switchmen 
of a group of railways, the employees put on the stand an 
expert statistician armed with a considerable array of statis- 
tical exhibits. From that time, the use of statistics in rail- 
way wage arbitrations has grown to tremendous proportions, 
costing hundreds of thousands of dollars in a single arbitra- 
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tion, and running the whole gamut of possible subjects. I 
do not unqualifiedly endorse this particular form of growth in 
statistical activity, much of which I consider wasteful and 
useless, but am merely recording it here as a matter of his- 
torical development. My emphasis is on the recent and 
consistent growth of this kind of statistical work. Again, 
the use of statistics and statistical exhibits in railway rate 
cases has been greatly developed during the past six or eight 
years. 

To say that industrial complexity will increase with the 
future development of economic activity is to point out, in 
another way, the growing need of the future for accurate sys- 
tems of statistical records and compilations. Along what 
lines the development will proeeed is beyond the scope of 
the present discussion to conjecture; suffice it that the develop- 
ment must be steady and effective if it is to measure up to the 
growing need for accurate information. And herein lies the 
mission of the statistician of today and the near future: So 
to develop the statistical methods and results of the present 
that they may meet the increasingly complex needs of future 
industry. We must especially train ourselves, as students 


of business problems, to master and digest the material made 
increasingly available to us, that we may make it of the great- 
est utility to the business world of the morrow. 
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THE “RATIO” CHART 


For Puiorrine STATISTICS. 


By Proressor Irvine Fisner, Yale University. 


COMPARING TWO MAGNITUDES BY THEIR DIFFERENCE AND BY 
THEIR RATIO. 


In the last few years there has been a great increase in the 
appreciation and use of statistical charts. As a consequence, 
a number of efforts have been made to improve the technique 
of graphic representation in order that the chart may convey 
its message more quickly, accurately, and unmistakably. 
Toward this end Mr. Willard C. Brinton in particular has put 
us all under obligation.* 

In the present article, I shall describe the nature and ad- 
vantages of one form of chart on which to plot a statistical 
curve,—by which is meant any curve or broken line showing 
the different numerical values of a statistical magnitude at 
different periods of time. This chart will be found of very 
great assistance when the only object or the chief object is to 
display and compare ratios. It may therefore be called the 
“ratio” chart.t 

The chief reason why ratio charting has not yet been more 
widely used is, I believe, that its extreme simplicity is not yet 
realized. Those who have mentioned it in print have usually 
contented themselves with stating dogmatically how it is to 
be used without explaining the whys and wherefores, further 
than that its spacing is “logarithmic” like that of a log- 
arithmic slide rule. But most persons regard logarithms and 
slide rules as a species of magic and fight shy of a method, the 
foundations of which they do not clearly understand. 

*Through the publication of his “Graphic Methods for Presenting Facts,” New York (The Engineering 
Magazine Company), 1914, 371 pp., and through the work of the Joint Committee on Standards for 
Graphic Reprcsentation which grew out of the publication of his book. 

tl am not, as J originally supposed, the first to hit on this simple device, although it still remains 
almost wholly unknown and unused and has never yet, so far as I know, been adequately described. A 


bibliography of the very meagre literature on the subject is given at the end of this article. It is there 
shown that the method in essence was used as early as 1863. 


2 
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The object of this article is to make the construction of 
the ratio chart clear to any reader and, at the same time, to 
point out in some detail its various uses and advantages. In 
doing this it is quite unnecessary to use even the concept of 
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FIG. 1d. FUTURE HYPOTHETICAL POPU- 
LATION OF THE UNITED STATES. OR. 
DINARY OR DIFFERENCE METHOD. 
Equal vertical intervals represent equal statistical 

differences. A line in ascending at a uniform ratio is 

curved. Uniformity is therefore not evident to the 
eye. 
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The same assumption. Uniformity is here evident 
to the eye, being represented by a straight line. 


logarithms. 

We may compare any two 
magnitudes, of like kind, by 
means either of their differ- 
ence or of their ratio. In the 
first kind of comparison “an 
inch on theend of one’snose”’ 
is exactly as much as an inch 
added to the height of the 
Washington Monument; in 
the second kind of compari- 
son, on the other hand, “‘an 
inch on theend of one’s nose”’ 
is an addition of about 40 per 
cent., or as much as 220 feet 
added to the height of the 
Monument. 

Theordinary chart is adap- 
ted to difference comparisons 
rather than to ratio compari- 
sons, whereas the statistician 
is usually concerned with 
ratio comparisons far more 
than with difference compa- 
risons. 


CONVERTING ORDINARY 
PLOTTING PAPER. 


Fig. 1 d (‘‘d”’ standing for 
“difference’’) shows an ordi- 
nary plotting chart; that is, 


.a chart with equal vertical 


spacing, labelled for popula- 
tion in millions, the labelling 


being for equal statistical differences, and each interval of verti- 
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cal ascent representing an increase of 10 millions over the pre- 
ceding. 

Since the key-idea of the ratio chart is that equal vertical 
intervals represent equal ratios of increase instead of equal 
differences of increase, it may also be constructed from ordi- 
nary cross ruled plotting paper. This may be done in two 
steps, (1) labelling the existing lines and (2) interpolating 
new lines. The first step is to label the existing (equidistant) 
horizontal lines with numbers increasing in a given ratio. 

Fig. 1 r (“‘r” standing for “‘ratio’”’), shows a ratio chart 
made from ordinary ruled paper simply by labelling the hori- 
zontal lines for ratios, each interval of vertical ascent now rep- 
resenting an increase of 10 per cent. over the preceding. 
That is, while the first chart is labelled vertically, 100, 110, 
120, 130, 140, 150, 160, 170, 180, 190,200, ete. (each number 
being 10 more than the one below) the second chart is labelled 
vertically, 100, 110, 121, 133, 146, 161, 177, 195, etc.* (each 
number being 10 per cent. more than the one below). 

The ratio method consists simply in plotting any statistical 
curve by using the labelsf of Fig. 1 r instead of those of Fig. 1 d. 

Thus, let us plot, say, the (imaginary) population of the 
United States in millions, beginning with 100 millions in 1910 
and assuming an increase of 10 per cent. every decade. Since 
all increases of 10 per cent. are represented in Fig. 1 r by equal 
vertical distances, it is clear that the curve representing popu- 
lation, 7. e., connecting the series of points in Fig. 1 r lying 
diagonally on the cross lines, will be a straight line. 

Thus we see the first merit of the ratio method of charting 
the growth of a statistical magnitude: uniformity in the per- 
centage rate of growth is pictured by straightness in the plotted 
line. 

On the other hand, in the ordinary or difference chart, such 
as that of Fig. 1 d, the same population growth will be repre- 
sented by an “exponential curve” and its uniformity of per- 

*If we carry the numbers backward, that below 100 will be 91 (for 91:100::10:11); that next below 91 
will be 83; and so on, indefinitely, down to any number, however small, except sero. Evidently we can 
a ee there is no base or sero 

ff course in the ratio chart as in the difference chart the numbering may be magnified or reduced in any 


ratio. Thus instead of the numbers 100, 110, 121, etc., we may substitute 1.00, 1.10, 1.21 or .100, .110, 
.121, etc., ete. 
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centage rate of growth is altogether lost to the eye. 
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In fact 


the uninitiated are apt to be misled and to falsely infer from 


FIG. 2d. SHOWING GEOMETRICAL (A) AND ARITH- 
METICAL (B) 
METHOD. 

A line (A) ascending at a uniform ratio in equal periods of 
time, that is, in geometrical progression is, curved (upward), 


PROGRESSIONS. DIFFERENCE 


A line (B) asce by equal differences !n equal periods o! 
time, that is, in arithmetical progression, is straight. Note 
that vertical intervals in in geometrical progression 
(see the light figures) are unequally spaced. 


FIG. 2r. THE SAME. RATIO METHOD. 

A line (A) ascending at a uniform ratio in equal periods 
of time, that is, in etrical ion, is straight. 
A line (B) ascending ual differences in equal periods 
of time, that is, in arithmetical progression, is curved 
(downward). i that oes a eee a 
in geometrical progression (see ight figures) are 
equally spaced. 


such a curve that the 
rate of growth is in- 
creasing. 


RERULING THE RATIO 
CHART SO OBTAINED. 


But the ratio chart, 
as described above, is 
not very convenient 
for plotting points be- 
tween the ruled lines, 
because so few of the 
labels are round num- 
bers. To make a full 
fledged ratio chart, 
new horizontal lines 
need to be interpo- 
lated, at their proper 
places, corresponding 
to the round numbers 
120, 130, 140, 150, etc. 
(Fig. 2 r) and the orig- 
inal equi-distant ones 
should be erased.* 
The contrast, then, be- 

*Plotting paper for ratio charts 
may now be obtained ready-made 
from a few commercial firms, the 
Educational Exhibition Company, 
26 Custom House Street, Providence, 
R. I., John Wenzel, 63 West 107th 
Street, New York City, Keuffel and 
Esser Company, 127 Fulton Street, 
New York City, and the Standard 
Graph Co., 32 Union Square, New 
York City. 

The unequally spaced horizontal 
lines shown in Fig. 2r become more 
and more crowded together as we 


ascend, until it becomes necessary, or 
advisable, to omit some of them. 


The omission of these lines causes the plotting paper to present the appearance of an intermittent or 


cyclical spacing (Fig. 3 r). 
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tween the ratio chart, and the ordinary, or difference, chart, 
is simply one of spacing, the ratio chart (as in Fig. 2 r) hav- 
ing the numbers 120, 130, 140, etc. unequally spaced whereas 
the difference chart (as in Fig. 2 d) has these same numbers 
equally spaced. 

The above description of a method of forming a ratio chart 


FIG. 3r. SCALES OF ELEVATIONS AND SLOPES. RATIO METHOD. 


Showing, at the left, the elevations (and depressions), representing various ratios of increase (or decrease), 
and showing, at the right, the slopes representing various per annum rates of increase (or decrease). 


will serve to explain its nature as representing equal successive 
percentages of increase by equal intervals on the chart. 

In Fig. 3 r we see the full fledged ratio chart, constituting 
a ruling arrangement convenient for plotting. To familiarize 
the reader with the graphic representation of ratios, some dark 
vertical lines are drawn. These show how far up to go to 
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represent an increase respectively of 10, 25, 50, and 100 per 
cent. (or 2-fold), 5-fold and 10-fold; while the next lines to 
the right show how far down to go to represent a decrease of 
10, 25, 50 (ie. “half of”), and 90 per cent. Any two 
points on the chart, however far removed from each other 
horizontally, if the vertical interval between them is equal to 
the 10 per cent. line, will be such that the statistical magnitude 
represented by the upper point will be 10 per cent. greater 
than that represented by the lower. Again the line repre- 
senting ‘‘100 per cent. increase or 2 fold” is the vertical dis- 
tance between any two points on the chart of which the upper 
stands for twice the statistical number for which the lower 
stands. 

The reader is advised to verify these and the other legends 
by taking any two points at random, reading the figures 
opposite them in the margin and comparing these figures. 

He may also familiarize himself with the sloping lines at the 
right ascending respectively at the rates of 1, 2, 5, 10, 25, 50, 
100 per cent. five fold, and ten fold per annum, as well as those 
descending at the rates of 1, 2, 5, 10, 25, 50, and 90 per cent. 


per annum, all drawn radiating from a center through the 
appropriate points on a vertical line one unit to the right of 
that center. 


REPRESENTING GROWTH AT A CONSTANT, OR NEARLY 
CONSTANT, RATIO. 


The advantages of the ratio chart over the difference chart 
aremany. They may be seen from a few illustrative examples. 

We have already seen (Fig. 1 r) that mere straightness of 
the plotted line indicates, in the ratio chart, a uniformity in 
the percentage rate of growth, whereas in the difference chart 
such uniformity is represented by an exponential curve (Fig. 1d). 

In Fig. 2 d and Fig. 2r the same lines are repeated and 
labelled A. The line A represents a uniform percentage rate 
of growth. It is an exponential curve in the difference chart 
(Fig. 2 d) and a straight line in the ratio chart (Fig. 2 r). 

For contrast a straight line B is drawn on the difference 
chart (Fig. 2d). Its straightness signifies little—merely that 
the same absolute difference is added each year. But the 
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same absolute difference is a decreasing percentage rate of 
growth and this fact is clearly interpreted in the ratio chart 
(Fig. 2 r). We may say that the chief or typical contrast 
between the two charts is that a straight line represents in 
the one an arithmetical progression and in the other a geomet- 
rical progression; which is only another way of saying that it 


00 12 4 16 18 20 22 24 28 30 5234 56 
YZARS 


FIG. 4d. UNIFORM PERCENTAGE RATE. DIFFERENCE METHOD. 
Showing uselessness of curve at extreme ends; also showing how to cumgese ieee slopes at 
p. 596. 


different points by comparing (inversely) the subtangents for these points. 
represents a progression by equal differences in the one and 
by equal ratios in the other. 

A serious fault in the difference method is that, in a curve 
of rapid growth, the difference chart is useful only in the middle 
portion. At the extreme left such a curve, e. g., an exponential, 
or uniform percentage, curve (as in Fig. 4 d) becomes almost 
indistinguishable from a horizontal line and at the extreme 
right it becomes almost indistinguishable from a vertical line. 
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FIG. 5r. THE SAME. RATIO METHOD. 
Showing clearly the slight deviations, since 1860, from a uniform rate of growth. 
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FIG. 5d. ACTUAL POPULATION OF THE UNITED STATES. DIFFERENCE METHOD. 
Showing the impossibility of correctly comparing rates of increase at different periods, 
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At either extreme no eye can estimate the percentage rate of 
growth although that rate may not be different from the rate 
in the middle. 

A vast number of statistical charts represent rapid and long 
continued growth,—the statistics of a prosperous business 
plotted from the beginning; the statistics of a growing country; 
the statistics of new inventions. We have merely to mention 
any such familiar examples as statistics of population, wealth, 
crops, mining, manufacturing, railway mileage, telephones, 
automobiles, bank deposits, new building, sales of stocks, war 
debts and other magnitudes rapidly increasing since the war 
began, in order to realize the thousands or rather, probably, 
the millions of statistical charts which have been constructed of 
this kind, most of which are nearly useless at either end. 

Thus Fig. 5 d represents the actual growth of the population 
of the United States. The reader, however discerning or 
experienced, cannot discover by mere ocular inspection of this 
curve whether the increase at the end is faster or slower than 
at the beginning. Fig. 57, on the other hand, shows the facts 
desired at a glance. It shows a uniform rate of growth be- 
tween 1790 and 1860 and slight but evident changes since the 
last named date. 

One great advantage of the ratio chart is in forecasting. 
Usually, in business, we forecast by assuming a certain ratio of 
growth. In the ratio chart we have simply to draw a straight 
line; usually, in fact, merely to produce the one already drawn 
representing the rate experienced in the past. 

One method by which users of the ordinary chart have 
attempted to meet its shortcomings in representing statistics 
of rapid growth is first to draw a ‘‘ growth axis,”’ or exponential 
curve [such as that in Fig. 1 d, Fig. 2 d (curve A), or Fig. 4 d] 
ascending at the average rate of growth of the statistics under 
consideration; then calculate the per cents. of deviation each 
year from this “growth axis’’; and finally, plot these devia- 
tions on a separate chart. 

This procedure, however, involves, in addition to charting 
the original statistical figures, much subsidiary calculation and 
charting; and the results, when obtained, are not as exact as 
the results obtainable more easily by the ratio method. 





The ‘‘ Ratio” Chart. 


FIG. 6d. EQUAL RATES OF GROWTH APPARENTLY UNEQUAL. DIFFERENCE METHOD. 


Segments of two curves ascending at equal rates but having unequal slopes and therefore deceiving 
the eye. 


COMPARING CURVES. 6 





If we wish to compare the growth of 
population in two countries of different 5 
sizes, such as Canada and the United 
States, the ordinary or difference plots, 
of which we shall assume that AA’ and 
BB’ in Fig. 6 d are small sections for a 
given year, will give the impression that 
the upper curve, 7. e., that for the larger 
country, is ascending at a much faster ratio 
than the lower, i. e., that for the smaller 
country. Few would suspect that the two 
lines A A’ and BB’ represent precisely the 
same percentage rate of growth. 

In Fig. 6 r, on the other hand, the 
equality of these same two rates of 
growth is clearly indicated by the paral- 
lelism of the two lines AA’ and BB’. 

Again, if the plots of the two popula- 
tions appear as in Fig. 7 d, most persons, 
observing the parallelism of the two lines, 
would jump to the conclusion that the 
two populations are growing at the same A 
percentage rate. The fact is that the / 
lower line, AA’, is ascending at a greater FIG. 6r. EQUAL RATES 
percentage rate than the upper, BB’. OFGROWTHEVIDENTLY 
The ratio chart (Fig. 7 r) shows this at | P@UAL. RATIO METHOD. 


The two segments are here 
a glance. parallel. 
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In the difference chart, in order to make even a rough eye- 
estimate of the comparative percentage rates of growth of AA’ 
or BB’ (Fig. 6 d or Fig. 7 d), we must: (1) note the position of 


o 


FIG. 7d. UNEQUAL RATES OF GROWTH APPARENTLY EQUAL. DIFFERENCE METHOD. 
Segments of two curves ascending at equal slopes, but at unequal rates and therefore deceiving the eye. 


the zero or base-line, XX’; (2) mentally 
compare A’X’ with AX and B’X’ with 
BX; and (3) compare the two compari- 
sons. 

Such mental operations are difficult, 
irksome, and inaccurate, especially if, as 
is not uncommon, the base or zero line 
has been omitted in order to economize 
space. Furthermore they involve shut- 
ting our eyes to the slope or steepness 
of the lines AA’ and BB’, the very 
feature which first attracts attention. 

Often, in fact, the bottom part of the 
chart, containing the base line, is cut off, 
and sometimes, instead of using the same 
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FIG. 7r. UNEQUAL RATES 
OF GROWTH EVIDENT- 
LY UNEQUAL. RATIO 
METHOD. 


The two segments are here of 
different slopes and the steeper 
slope indicates the greater rate 
of growth. 











base line for two curves on the same chart the draughtsman 
will bring one curve nearer the other by using two separate 
base lines. In either event the result is misleading or confusing 
and in such cases it is almost hopeless to obtain any clear idea 
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FIG. 8d. JUGGLING WITH BASES AND 
SCALES. DIFFERENCE METHOD. 
Curves A and B apparently vary in exact 

mee; but if A is changed to the 

same and scale as that employed for B, it 

becomes A’, which corresponds much less close- 

ly toB. The comparison between A and B is 

i ing and even the comparison between 
A’ and B is not exact. 


FIG. 8. THE SAME. RATIO METHOD. 


The comparison between A and B here exactly represents the facts, showing only a faint resemblance, in 
marked contrast with the first comparison in the preceding figure. 
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of the comparative percentage growths except by recourse to 
tedious arithmetical computations. Thus in Fig. 8d the 
curves A and B seem to be exactly similar. But they are far 
less similar than they appear; for the curve A is relative to a 
remoter base than the curve B. If plotted on the same base 
and scale as B, the curve A becomes A’ and its similarity to B 
is greatly diminished. But even this degree of similarity is 
greater than the statistics warrant, as we see when using the 
ratio chart (Fig. 8 r). This shows the exact degree of simi- 
larity of the two curves A and B which turns out to be very 
small. Thus the ratio chart is an effective means of avoid- 
ing juggling with statistics through base-selection or scale- 
selection. 

Fig. 9 d and Fig. 9 r, taken from a brief article of mine in 
the New York Times Annalist of March 17, 1917, show a type 
of error opposite to the foregoing, and one taken from actual 
statistics. Fig. 9 d gives the impression that the prices of 
breadstuffs have fluctuated less than the prices of ‘‘all com- 
modities.’’ On the ratio chart, however, as shown in Fig. 9 r, 
although exactly the same numbers are plotted, it is seen 
that breadstuffs have actually fluctuated a trifle more violently 
than “all commodities.” 

For comparing in detail any two curves on ratio charts, 
such as those in Fig. 9 r, we may do what we have just seen is 
not properly permissible on a difference chart,—we may move 
bodily either curve, the upper curve downward or the lower 
curve upward, until the two are close together. Then the 
various degrees of parallelism or divergence, at various periods 
of time, may be seen with the utmost clearness. This is done 
in Fig. 107r.* Such close comparison will usually give quickly, 
through the eye, a better practical picture, I think, of the 
degree of correlation and certainly of the location of the corre- 
lation, than can be obtained even by laborious calculations of 
coefficients of correlation. 

Index numbers with widely different bases, such as those of 
Sauerbeck and of the United States Bureau of Labor, if plotted 
on difference charts without especially selecting the scales or 
without multiplying or dividing by suitable constants, appear 


*The scale on the right applies to the lower curve, that on the left to the upper. 
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far more different from each other than they really are. 
Ratio plotting will show their exact similarities and differences. 

As was said at the outset, difference charts are useful when 
difference comparisons are wanted. Not only are such cases 
rare, however, but often difference comparisons between curves 
are meaningless, and ratio comparisons are alone possible. 
This is the case when two curves represent incommensurable 
magnitudes as, for instance, when a curve of railway gross 
receipts is compared with one of population or one of the cir- 
culation of money with one of the price level. In such cases 
difference plotting involves a very dangerous discretion in 
selecting the scale and the base in order neither to exaggerate 
nor to understate the degree of correspondence. For instance, 
the close similarity in the recent changes in the money in 
circulation in the United States, on the one hand, and the 
price level, on the other, was not evident to me until I plotted 
the two curves on a ratio chart (Fig. 11 r). This similarity 
might be almost overlooked on a difference chart if the scales 
for dollars and index numbers were selected arbitrarily. 

One special, though slight, advantage of the ratio chart is in 
representing more clearly than in the difference chart the 
results of multiplying or dividing a statistical series of numbers 
by a constant, or by another series, and in exhibiting the 
resultant third series relatively to the other two.* 


ATTEMPTS TO MEND THE FAULTS OF THE DIFFERENCE CHART. 


One attempt to secure better comparability of curves in 
ordinary plotting is that customary in handling index numbers, 
—namely to reduce all prices to percentages of the base-period 
price, so that we may start all the curves at an even 100 per 


*In detail: (1) to multiply or divide by a constant any series of numbers plotted in a curve is merely to 
shift the curve bodily up or down by the appropriate constant distance; (2) the reciprocals (of a statistical 
series plottedin a curve) form a second curve symmetrical (relatively to the unity line) to the first curve; in 
other words two curves representing reciprocal numbers are such as to coincide if the chart is folded over 
the unity line; thus a plot representing an index number of prices will be the exact reverse of that repre- 
senting the purchasing power of money, the one ascending whenever the other descends and at the same 
angle; (3) the curve formed by plotting the products of two series will be such that its distances from 
the unity line will be the sum of the differences of the other two from that line; (4) analogously, a quo- 
tient of two curves is distant from the unity line the difference of their distances (i. ¢., distance of the 
dividend curve less that of the divisor curve). Thus if we plot a curve of acres per capita of wheat 
lands and another of bushels per acre and then plot the curve of their products, i. e., of bushels per capita, 
the three curves will be seen to be related as just described; the same would apply to index numbers of 
prices, of wages, and of purchasing power of wages (“real wages”). 
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cent. If the curves do not greatly diverge this gives approx- 
imately correct results. But it is a makeshift method and 
never gives the absolutely exact comparisons of the ratio 
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FIG. 1lr. MONEY AND THE PRICE LEVEL. RATIO METHOD. 


Showing the exact degree in which the price level in the United Jot Stetgs hen Gastunted fn conpesionn with 
the amount of money in circulation. Since the war, there has been a close correspondence, changes in the 
price level following changes in money by two or three months. 


method; and, after the curves have diverged considerably, 
their correct comparison becomes difficult. 
Another method is that suggested by Professor Alfred 
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Marshall. It consists in making a special geometric con- 
struction.* But recourse to geometric construction is slow 
and inexact and does not appeal to the eye. 

As a matter of fact, users of charts, when seeking percentage 
comparisons, are apt simply to read off the four numerical 
figures opposite the various points (as A and A’ and B and B’ 
in Fig. 6 d or Fig. 7 d) and compute arithmetically the two 
percentage rates of increase, (i. e., the percentage excess of 
height of A’ over A and that of B’ over B). But such a proce- 
dure is nothing less than giving up the use of the diagram as 
a diagram and using it merely as a table of arithmetical figures. 
A diagram is supposed to interpret figures to the eye and to 
need no interpretation itself through arithmetical processes. 
When it does need such interpretation, it fails of its purpose 
as a quick, clear, convenient, and reliable picture. 

Finally we note that the foregoing correctives and safeguards 
necessary for using the “‘difference” method correctly are, in 
actual fact, almost invariably neglected. Practically no one 
uses Marshall’s subtangent comparison; few even mentally 
measure the heights of points above the base line,—much less 
mentally reckon; from these heights, the percentage rates of 
change; few even notice whether the base line isinserted or omit- 
ted or whether there are different base lines for different curves 
whose inconsistency needs to be allowed for; growth axes are sel- 
dom constructed; few make any effort to shut their eyes to 
straightness or parallelism in order to avoid being misled into 
assuming uniformity or similarity of percentage changes; few, 
even, use a diagram as a table of figures. 

Professor Marshall says of the ordinary method of curve- 
plotting: ‘‘Its defects are such that many statisticians seldom 
use it except for the purpose of popular exposition, and for 
this purpose, I must confess, it has great dangers.’’+ A business 

*“On the Graphic Method of Statistics,” Jubilee Vol. of the Royal Statistical Society, 1885, pp. 251-60. 
To help translate the deceptive slope of different lines, such as AA’ and BB’ in Fig. 6 d or Fig. 7 d, we pro- 
duce these lines so that they cut the base line. If, when so produced, they are found to cut the base line 
at the same point, as is the casein Fig. 6d, where they both cut at Y, then we may know that they have 
the same percentage slope. If, asin Fig. 7 d, they do not cut the base line in the same point, the one which 
cuts the further off has the smaller percentage rate of increase and the rates of increase per annum of the 
two lines at A and B are inversely proportional to the distances at which they cut the base as measured 


from X. Thus if, in Fig. 7 d, XY is 14/20 of XZ the percentage change at A is 20/14 of that at B. 
Again, in Fig. 4 d the slopes at A and A’ are equal, as XY=X’Y’. All of these results may be proved 


t “On the Graphic Method of Statistics,” Journal of the Royal Statistical Society, 1885, p. 251. 
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man whose attention was recently called to the advantages of 
the ratio chart recharted his business statistics and was startled 
to discover how he had been misled by the ordinary method. 

After such a severe, but, I believe, just, indictment against 
ordinary or “difference” plotting it may be asked whether, as 
usually employed, such plotting is not worse than useless. It 
is certainly true that ordinarily it misleads to some extent. I 
havesometimes found myself hesitating to use a curve in a pub- 
lication, because, on the one hand, of the space it would take up 
if the base line were inserted and, on the other, of the mistaken 
impression it might make if the base line were omitted. 

The best that can be said for the difference method is: it 
always shows whether there is an increase or decrease; it 
usually displays the grosser contrasts at a glance; the base or 
zero line gives a means, lacking in the ratio method, for plotting 
zeros, for comparing positive and negative quantities, and for 
seeing in a simple and self-evident comparison the vertical 
elevations of points in a curve above the base line. 


WAYS OF UTILIZING DIRECTION IN THE RATIO CHART. 


The eye reads a ratio chart more rapidly than a difference 
chart or a table of figures. We may recapitulate what most 
easily catches the eye as follows: 

1. If we see a curve ascending, and nearly straight, we know 
that the statistical magnitude it represents is increasing at a 
nearly uniform rate. 

2. If the curve is descending, and nearly straight, the sta- 
tistical magnitude is decreasing at a nearly uniform rate. 

3. If the curve bends upward the rate of growth is increasing. 

4. If downward, decreasing. ; 

5. If the direction of the curve in one portion is the same 
as in some other portion it indicates the same percentage 
rate of change in both. 

6. If the curve is steeper in one portion than in another 
portion it indicates a more rapid rate of change in the former 
than in the latter. 

7. If two curves on the same ratio chart run parallel they 
represent equal percentage rates of change. 

8. If one is steeper than another the first is changing at a 
faster percentage rate than the second. 
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9. The imaginary straight line most nearly representing, 
to the eye, the general trend of the curve, is its ‘‘ growth axis,” 
and represents the average rate of increase (or decrease); and 
the deviations of the curve from this growth axis are plainly 
evident without recharting. 

10. The slope of the imaginary line between any two points 
on a& curve indicates the average rate of change between the 


two. 


WAYS OF UTILIZING ELEVATION IN THE RATIO CHART. 


The preceding relates to direction. As to elevation, the eye 
can, with a little familiarity, translate vertical elevation into 
numerical ratio; for, as we have seen, a certain elevation re- 
presents a 10 per cent. increase, another a 100 per cent. in- 
crease, or doubling, etc., etc.* 

To one accustomed to using the difference chart the use of 
elevations on the ratio chart may at first be confusing. But 
only a few minutes are necessary to learn its use; and this use 
of the mere elevation for measuring the ratios, or vertical 
distances, between any two points between the two magnitudes 
represented by those points, is really easier and more exact than 
the more self-evident method, in the ordinary chart, of com- 
paring their two vertical distances from the base line. t 


*For roughly interpreting elevations without having to read the marginal figures or measure dis- 
tances, the original system of equidistant horizontals, as shown in Fig. 1 r, is not inconvenient. The 
chart is divided into horizontal 10 per cent. bands and the eye can approximately reckon how far any 
point on a statistical curve is vertically above any other point,—whether one band’s breadth, or two, 
or any fractioual part, and, therefore, whether it represents a magnitude 10 per cent. more, or 10 per 
cent. more than 10 per cent. more, i. ¢. 21 per cent. more, ete. 

{We may also, if we wish, make much more exact comparisons than those afforded by the glance of an 
eye. Wemay exactly measure any slope such as that of a tangent to a curve at any point or such as that of 
a line connecting any two points on the curve. This may be done in accordance with the constructions of 
the radiating lines in Fig. 3 r; by drawing a line parellel to that of which the slope is desired, from any 
convenient point in the horizontal line “10”. Let this cut the vertical line one unit to the right of said 
point; the height of this intersection above (or below) line “10” measures exactly the rate of ascent (or 
descent) of the slope to be measured. 

Again we may exactly measure any ratio comparison between two points on the same curve—or on differ- 
ent curves for that matter. All we need to know is the vertical distance between the two pointe and 
compare this with the vertical scale above (or below) the line “10,” as illustrated in Fig.3r. Such a com- 
parison may be made by a draughtsman’s dividers, parallel rulers, sliding triangle, or even by using the 
edge of a sheet of paper and a lead pencil. Thus, if the elevation, as measured, is found equal to the distance 
between “1” and “2” (or “10” and “20” or “100” and “ 200”) then the upper point represents a statistical 
magnitude just twice that represented by the lower point; or again, if the measured elevation is equal to 
the distance between “1” and “1.37” (or “10” and “13.7” or “100” and “137”) on the scale, then the 
upper point represents a statistical magnitude 37 per cent. greater than that represented by the lower 
point. 
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SUMMARY. 


In the ratio method, then, a straight line* always represents 
a constant percentage rate of increase or decrease and, con- 
versely, a constant percentage rate of increase or decrease is 
always represented by a straight line; a curve deviating from 
a straight line invariably implies that the percentage of change 
deviates correspondingly from constancy; any two curves or 
two portions of the same curve which are parallel represent 
exactly equal percentage rates of change; any two curves or 
portions of curves which show a contrast of direction always 
indicate a corresponding contrast in percentage change; if the 
numbers plotted are halved or changed in any other ratio, the 
resulting curve will simply be raised or lowered but will main- 
tain exactly the same series of directions and therefore present 
the same appearance to the eye; if the scale is properly selected, 
a curve is never nearly horizontal except when it actually 
represents an almost infinitesimal rate of increase or decrease, 
nor is it ever nearly vertical except when it actually represents 
a rate correspondingly enormous; as there is no zero line there 
is no waste space on its account and the diagrams can be cut 
off close, both above and below the curve; there can be no 
juggling with base lines or scales; there is no need of special 
supplementary geometric constructions, such as Marshall’s 
subtangent construction; there is no need of laborious calcu- 
lations to reduce original figures to index numbers or per- 
centages; there is no need of eliminating the growth axis 
(which, in the ratio method, is simply a straight line, the per- 
centage deviations from which are apparent without special 
calculation or replotting). 

The features of a curve which, whether we will or not, most 
“‘catch” the eye are concerned with comparative direction, — 
straightness or curvedness; steepness or flatness; parallelism 
or divergence. These features therefore ought to be, not 

*It is interesting to note that engineers have found it advantageous to devise special plotting charte 
which will reduce parabolic, hyperbolic and probability curves of the ordinary charts to straight lines on 
the special charts, ¢. g., see A. 8. Langsdorf, “Methods for Determining the Equations of Experimenta! 
Curves,” Journal of the Association of Engineering Societies, June, 1904, pp. 325-43; L. F. Harsa, “Notes 
on Determination of Experimental Equations,” Wisconsin Engineer, December, 1908; George C. Whip- 
ple, “Element of Chance in Sanitation,” Journal of the Franklin Institute, Vol. 182, July 7, 1916, p. 37; 


Allen Hasen, “Storage to be Provided in Impounding Reservoirs for Municipal Water Supply,” Pro- 
ceedings of the American Society of Civil Engineers, Nov., 1913. 
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snares nor stumbling blocks, as they are in the “difference” 
chart, but aids or sign boards as they are in the “ratio” chart. 

And, besides the full utilization of direction, we have, in the 
ratio chart, that of elevation. While the interpretation by the 
eye and mind of elevation requires a little preliminary training 
it soon becomes easier, more rapid and more accurate than 
the corresponding procedure for difference charts. 

In a word, the ratio chart simply utilizes the natural powers 
of the eye. Consequently, when one is once accustomed to it, 
it never misleads, but always pictures a multitude of ratio 
relations at a glance, with absolute fidelity and without the 
annoyance of reservations or corrections. 
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ON THE VARIATE DIFFERENCE CORRELATION 
METHOD AND CURVE-FITTING. 


By WarrEN M. Persons, Harvard University. 


(Professors A. A. Young of Cornell University and E. E. Day of Harvard 
University have generously read the manuscript of this paper and have made 
helpful suggestions. Computations and pertinent comments have been made 
by Mr. Edwin Frickey of Colorado College. W. M. P.) 


I. 


In Biometrika for April, 1914, “Student’”’ generalized the 
method of differences for the elimination of spurious correlation 
due to order of items in time or space. In the same Journal 
for November, 1914, Dr. O. Anderson of Petrograd provided 
the probable errors of the successive difference correlations 
of two series where the correlations of random pairs of the 
variates are zero. In the same number B. M. Cave and Karl 
Pearson presented “‘ Numerical Illustrations of the Variate 
Difference Correlation Method,” using Italian economic data. 

The conclusion of “Student” is this: ‘If we wish to elimi- 
nate variability due to position in time or space and to deter- 
mine whether there is any correlation between the residual 
variations, all that has to be done is to correlate the Ist, 
2nd, 3rd. . . nth differences between successive values of 
our variable with the Ist, 2nd, 3rd... mth differences 
between successive values of the other variable. When the 
correlation between the two nth differences is equal to that 
between the two (n+1)th differences, this value gives the 
correlation required.”* The meaning of “Student” is that 
the correlation required is indicated by the ultimate steadi- 
ness of values of the correlation coefficient for higher multiple 
differences of the items.t 

After working with 11 series of 28 items each Miss Cave and 
Professor Pearson stated:{ ‘‘In most cases our difference cor- 

* Biometrika, April, 1914, p. 179. 


t Biometrika, November, 1914, footnote p, 340. 
t Ibid., pp. 354-355. 
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relations have hardly even with the sixth differences reached a 
steady state. . . . In the great bulk of instances there is 
still a more or less steady rising or falling appreciable in the 
difference correlations, and all we can really say is that the 
final value, the true ryy, will be somewhat greater or less 
than a given number. From an examination of the actual 
numerical working of the c>rrelations, it appears to us that the 
terminal values are in the case of these short series of very 
great importance. It is further clear that the theory as given 
by ‘Student’ depends upon certain equalities which are not 
fulfilled in practice in short series. We await with much in- 
terest the complete publication of Dr. Anderson’s work, and 
hope to find a fuller discussion of the allowance to be made in 
short series for the influence of the terminal state of affairs 
on the steadiness of the series and on the approach to the 
standard deviation formulae. But apart from these lesser 
points, our present numerical investigation has convinced us 
of the very great value of the new method of Variate Differ- 
ence Correlations.” 

In the demonstration leading to his theorem, previously 
quoted, “Student” stated that “‘if x1, 22, 23, etc., y1, Ye, Ys, etec., 
be corresponding values of the variables z and y, then if 
1, Le, Xs, etc., Yr, Ye, Ys, etc., are randomly distributed in 
time and space, it is easy to show that the correlation be- 
tween the corresponding nth differences is the same as that 
between x and y.’* In the proof of this statement certain 
assumptions were made as follows: first, that there is a signi- 
ficant correlation between items with similar subscripts, 7 e., 
Y2xYx 7-0; second, that if the items of each series be paired 
with the preceding items of the same series the correlation 
will be zero, 7. e., 2(tx¢%x_41) =0 and Lyxyxn4,=0; third, that 
there is no significant correlation between the items of one 
series and the items of the other when there is a lag in either 
direction, 7. ¢., 2(txyx+1) =0, 2(tx+1yx) =0; fourth, that the 
sum of the items of each series is zero, 7. e., 22, =0, Dyx=0; 
fifth, that the time element in any series ordered in time can 
be expressed as an algebraic function of time (¢) of some de- 
gree, the nth. 


* Biometrike, April, 1914, p. 179. 
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It is my contention that these assumptions are such as 
cannot be retained in applying the method to the most com- 
mon types of problems. For instance the pairing of items of 
two time series is made possible by the position of those items 
in time either because they occur in the same time interval 
(concurrent) or in definitely related intervals (lag). Our 
problem may be, and usually is, not only to determine the cor- 
relation but to find what pairings give the maximum correla- 
tion. In such case the assumption that only one pairing is 
significant vitiates the conclusion at the outset. The writers 
on the variate difference correlation method all assume that 
“the true ryy’” is for pairs concurrent in time. 

Let the method of variate differences be applied to two time 
‘series artificially constructed. Let series A be made up of 
successive values of the function 2¢+3 with +1 and —2 alter- 
nately added to the items. Let series B be made up of the 
values of successive values of the function #+3t+2 with —1 
and +1 alternately added to the items. We will have the 
following series and differences: 


SERIES A. 


Items Ist D 2nd D 

4 a 

3 —1 ea 
8 +5 +6 
7 —1 —6 
12 +5 +6 
11 —1 —6 

etc. 


SERIES B. 


Items 1st D 2nd D 
1 ‘a 
7 +6 
11 +4 
21 +10 
29 +8 
43 +14 
etc. 
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The process of taking differences eliminates the elements 
due respectively to the first and second degree functions of 
time. The oscillating elements remain. If concurrent items 
for the third and higher differences are paired, we have 
tap? — | . , ¢™=-—1, If the items are paired with a 
lag of one in either direction we have r“ =r, =r®,=r{,= 
r™ =7™—=+1.* In this case the time elements 
are eliminated as far as they can be by the method of differ- 
ences and yet the series resulting are not, properly speaking, 
random in time. It will be of interest to determine if this 


sort of oscillation occurs in actual data. 


II. 


I have recently been working with some 21 series of economic 
statistics for the United States for the period 1879 to 1913, giv- 
ing 35 items to each series. Application of the variate differ- 
ence correlation method to such series has forced me to the 
conclusion that neither the possibilities nor the limitations of 
the method, when applied to short series, have been appre- 
ciated by the writers on the subject. 

The applications of the method by “Student” and by Miss 
Cave and Professor Pearson are not satisfactory tests: first, 
because they have applied it merely to items of the same date, 
thus assuming that the real correlation can exist only for such 
pairs; and, second, they assume that the correlation indi- 
cated by the steadiness of coefficients between higher differ- 
ences only is significant, the coefficient for first differences not 
being significant unless it is supported by steadiness of the 
coefficients of higher differences. My objections to these as- 
sumptions and the conclusions based upon them will be il- 

* In r; the subscript ¢ indicates the number of time intervals that the items of series A precede (—) or lag 


behind (+) the corresponding items of series B. 

+ At the conclusion of his article in Biometrika, April, 1914, pp. 269-279, Mr. O. Anderson makes the 
following statement, which I translate: 

“If we take into consideration that for our purposes the evolutionary component of a series has disap- 
peared if it becomes so smal! relative to the oscillatory component that it can influence only the 3rd, 4th, 
etc., decimal place of the expression for R [the coefficiént of correlation] then we may conclude that not 
only components which are represented by a parabola of higher order, but also those represented by tran- 
scendental functions (such as a sine curve) become eliminated taking by a finite number of differences. 
Further it may be shown that generally all more or jess ‘smooth series’ all of which are characterised by a 
considerable degree of positive correlation between adjacent items, lose the character of smoothness in the 
process of multiple differences. The generalized Cave-Hooker procedure is, therefore, manifestly a quite 
universal means of sifting out the correlation between the oscillatory elements of complex series.” 
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lustrated by application of the variate difference correlation 
method to American data for the period 1879-1913. The 
series used are the following: 


Wholesale prices of commodities. 

Gross receipts of railroads. 

Net earnings of railroads. 

Coal production. 

Exports from the United States. 

. Imports into the United States. 

Pig-iron production. 

. Price of pig-iron. 

Immigration (fiscal year). 

Shares sold on the New York Stock Exchange. 

Average price of shares sold on the New York Stock 

Exchange. 
. New York clearings plus five times outside clearings 
(called clearing index). 

13. Clearing index divided by relative wholesale prices 
(called corrected clearing index). 

14. New railroad mileage constructed. 

15. Per cent. of business failures. 

16. Liabilities of business failures. 

17. Balance of trade. 

18. Weighted index numbers of the yield per acre of nine 
leading crops. 

19. Ratio of loans to resources of banks. 

20. Ratio of cash to deposits of banks. 

21. Surplus reserves of New York associated banks. 


—~ PP ONS Om Oo bo 


— 


— 
bo 


In each case, except for the clearing index, I assumed the 
secular trend to be linear. A straight line was fitted to each 
series by the method of least squares or the method of mo- 
ments. For the clearing index I assumed the secular trend to 
be the compound interest law and the function y=B C*, where 
t represents time and B and C are constants determined by the 
data, was fitted to the series. The deviations of the raw fig- 
ures from the lines of secular trend were found and designated 
the ‘‘cycles.” 





43] Correlation Method and Curve-Fitting. 607 


TABLE I. 


COEFFICIENTS OF CORRELATION FOR THE CYCLES OF THE BUSINESS BAROMETER 
AND THE CYCLES OF WHOLESALE PRICES, GROSS RECEIPTS OF RAILROADS, COR- 
RECTED CLEARING INDEX AND SURPLUS RESERVES OF NEW YORK BANKS TO- 
GETHER WITH COEFFICIENTS FOR MULTIPLE DIFFERENCES, FIRST TO SIXTH, 
WITH VARIOUS DEGREES OF LAG, 1879-1013. 








Coefficients of Correlation. (a) 
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+.79 | —.81 | + 


+.17 





























(a) The subscript i of the coefficient of correlation r indicates the years lag (+-) or years previous (—) 
of the Business Barometer. 

Investigation of the 21 series of cycles led to the conclusion 
that the fluctuations of 9 of them synchronize and hence can 
logically be combined into a business barometer. Consequently 
a business barometer is constructed of the 9 series, those 
numbered 1 to 9 in the list given on the’ preceding page.* 
The coefficients of correlation for the cycles of the business 
barometer and the cycles of wholesale prices, gross receipts of 
railroads, corrected clearing index and surplus reserves of 


* For details of this investigation see the article by the writer in the American Economic Review, Decem- 
ber, 1916. 
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New York associated banks, together with the coefficients of 
correlation for the first to sixth differences, with various degrees 
of lag for cycles and differences, are given in Table I. Of 
course there is an element of spurious correlation in the coeffi- 
cients for the business barometer, wholesale prices, and gross 
receipts of railroads because the two last named series enter 
into the barometer. That element is not believed to be large 
for the cycles and first differences. 

Examination of Table I reveals: first, high, positive, and 
steady coefficients for concurrent items for wholesale prices 
and gross receipts of railroads, high, negative, and steady 
coefficients for surplus reserves, low, positive, and fairly steady 
coefficients for the corrected clearing index; second, all the 
coefficients for higher differences show a marked tendency to 
alternate in algebraic sign as successive degrees of lag are 
taken in either direction. 


TABLE II. 

PROBABLE ERRORS OF COEFFICIENTS OF CORRELATION FOR SERIES OF 35 AND 46 
ITEMS, RESPECTIVELY, AND FOR THEIR MULTIPLE DIFFERENCES, FIRST TO 
SIXTH.(a) 

(Thirty-five Items in Original Series.) 








ym . | astD. | 2ndD. 
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15 
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17 
18 
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_ eaenea formulae developed by O. Anderson and A. Ritchie-Scott. See Biometrika, Novem- 
r, , p. 136. 
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The significance of the various coefficients depends not only 
on their size but upon the number of items used in the com- 
putation. The probable errors for various coefficients based 
on 35 items in the original series, first to sixth differences are 
given in Table II.* The probable errors for sixth differences 
are, approximately, twice those for the original series. A 
coefficient of, say .45 or more would be significant for the 
original series and of .65 or more for sixth differences. What 
is the explanation of the observed steadiness, and of the 
alternation of sign of coefficients for various degrees of lag? 
“Student” believes that the steadiness is due to the random 
distribution, with respect to time, of the differences. The 
alternation in sign is a phenomenon not noticed, or if noticed 
not considered, by the writers on the subject. 

First, let us consider the phenomenon of alternation in sign. 
Let one of the original series be z,, 2%, 22 . . + Z%n—1, the 
first differences being 21;—2%,, %2—2%, . +» + %n-1—2a~g-T 
Suppose the first differences alternate in sign at any point so 
that we have 

te—-TR-1= +a 

e41—-Te=—b 

iK4+2—TK+1= +C 

etc. 

where a, b, c . . .. are positive numbers. The second 
differences are —b—a, +c+b, —d-—c, . . ., a series al- 
ternating in sign and larger numerically than the series from 
which it is derived. For the portion of the original series, 
however, where there is no alternation in sign the first differ- 
ences will be smaller numerically than the items from which 
they are derived. Since the nth differences are derived from 
the (n—1)th differences in the same way that the first differ- 
ences are derived from the original series we have the following 
conclusion: If consecutive items of a series alternate in sign 
the first and higher differences will also alternate in sign and 
the resulting ‘items will increase numerically as the order of 


* These probable errors are computed from theories developed by O. Anderson, Biometrika, April, 
1914, p. 269. The formulae for probable errors may hold for a very large number of items but I 
doubt their validity for less than 100 items. 

t The standard notation for successive finite differences is A, A*, A*, etc. That notation is not 
used in this paper because it would tend to conceal relations brought out in Part LII of this paper 


4 
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the difference increases. A succession of like signs may persist 
with the first and higher differences but the numbers resulting 
will be smaller numerically than those resulting where the 
sign alternates. Where the variate difference method is ap- 
plied to two short series we may, therefore, expect the terms 
alternating in size to be of dominating influence upon the 
coefficient of correlation. Also when a lag is taken in either 
direction the coefficients will tend to alternate in sign. 


TABLE III. 


PERCENTAGES FOUND BY TAKING THE RATIO OF THE NUMBER OF CASES IN WHICH 
SUCCESSIVE ITEMS DIFFER IN SIGN TO THE POSSIBLE NUMBER OF ALTERNATIONS 
IN SIGN OF SUCCESSIVE ITEMS; VARIOUS SERIES, FIRST TO SIXTH DIFFERENCES, 
1879-1913. 








Percentage of Unlike Signs. (a) 





Ist D. | 2nd D. 3rd D. | 4th D. 





Liabilities of commercia] failures 
Gross receipts of railroads 
Percentage of loans to resources of banks 


Percentage of cash to deposits in banks........... 
Average price of shares on N. Y. St. Exc 

Correct ing i 

Exports of merchandise 

Imports of merchandise 

Pig-iron produced. 

Net earnings of railroads 

Price of pig-iron 

Surplus reserves of New York banks 

New rai mil constructed 

Indices of crop yield, 3-yr. averages 

















Average of percentages 











_ (a) A + or — item followed by a null item or vice versa is counted 3. The possible numbers of unlike 
signs are 33, 32, 31, 30, 29 and 28 for the differences, Ist to 6th. 

Table III shows the tendency of the higher differences of 
the series here under consideration to alternate in sign. With 
34 items in the series of first differences there are 33 possible 
alternations in sign when each term is compared with the 
preceding term; there are also 33 possible cases of steadiness 
in sign. Counting the number of times that the signs of suc- 
cessive terms of each series alternate and expressing the num- 
ber as a percentage of the possible number (32 for second 
differences, 31 for third differences, etc.) we have the per- 
centages appearing in the table. For numbers chosen at 
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random we should expect the first differences to have 50 per 
cent. steadiness in sign and 50 per cent. alternation, on the 
average. We get exactly 50 per cent. as the average for the 
series listed. For second and higher differences the average 
percentages of alternation are: 64, 70, 75, 79, 82. A marked 
tendency to alternation in sign is revealed. The cumulative 
effect of this tendency to alternate m sign as higher differences 
are taken is also revealed in Table IV. Where the series A, 
B, C, D, and E are correlated with themselves (AA; BB, CC, 
DD, EE) there is a numerically increasing but negative coeffi- 
cient for a lag of one item. Where all possible combinations 
of the five series are taken the coefficients alternate in sign 
or show a strong tendency to do so with the higher differences. 


TABLE V. 


COEFFICIENTS OF CORRELATION BETWEEN TWO RANDOM SERIES OF 35 ITEMS EACH 
AND BETWEEN THEIR MULTIPLE DIFFERENCES, FIRST TO EIGHT, CONCURRENT 
AND A LAG OF ONE AND OF TWO ITEMS IN EITHER DIRECTION. 








Coefficients of Correlation. 
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This theory of the tendency of signs of terms of higher 
differences to alternate and, therefore, to affect the coeffi- 
cients of correlation was tested by applying the method of 
variate differences to two random series of 35 items each. 
The method of selection of the numbers was this: the pages 
of a table of six-place logarithms were turned at random, the 
tip of a pointer was placed at random on the page and the 
two digits at the right of the logarithm indicated by the 
pointer were taken as the items of the series. The coefficients 
of correlation between the two random series of 35 items and 
between their multiple differences, first to eighth, concurrent 
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and for one and two items lag in each direction are given in 
Table V. For the first and higher differences there is a per- 
sistent alternation in sign as we take a lag in either direction.* 
The coefficients alternate in sign, of course, because, first, the 
two series correlated alternate in sign, second, the terms 
alternating in sign become the dominating ones when the 
products of corresponding items are taken and third, lagging 
either series in either direction will bring a different set of 
signs into correspondence. The term Zzy tends to alternate 
in sign and hence r does. 


III. 


Does this phenomenon of the alternation in sign of the 
coefficients, left to right, in the tables, have any bearing on the 
steadiness or unsteadiness of the coefficients, in the tabular 
columns, based on successive differences but with the same 
lag throughout? This question will now be considered. 

The series of nth differences is derived from the series of 
(n—1)th differences by the same process that the series of first 
differences is derived from the original figures. Therefore, 
the expression for the coefficient of correlation for nth differ- 
ence is the same function of the (n—1)th differences that the 
first differences is of the original series. 

Let r&™ represent the coefficient of correlation between the 
mth differences of the series z,, 21, . + + Zn—; and y, ¥3, 

Yn—1 Where the subscript L denotes the lag of the 
x series. When L=1 we have the pairs LyYo, TeY1 
In—1 Ya-2; When L=—1 we have the pairs 2,y;, 21¥2 
In—2 Yn—1, etc. The formula for the coefficient of correlation 
between concurrent items of the original series is usually 


written in the form 
Ken-1 


= (tx-2)(yx—y) 
K=o 





(1) 


t= 





Fon-i Lani 
- (tx—=)* ~ (yx—y)? 


when Z and y are arithmetic averages of the respective series. 


* None of the coefficients found signify appreciable correlation between the series. In but one case is 
the coefficient more than three times its probable error and in the majority of cases the probable error is 
approximately the same as the coefficient. 
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The function r, may be written 
n—1 a—1 a-l 


m2 TKyK— 2 te Y« 





(2) 





T= 
Vine: tk-(2 tx)In re z vn) 


The coefficient of cutuitatien w. niin it first dif- 
ferences 2;—2,, Y¥1—Yo; %2—%1, Ya-Waj + + + Tea Tn 


Ya—-1 Yn—-2 +18, noticing that > Sea an =2,-1—2, and 
; 1 

n—l 

= (Yx—YxK-1) =Yn-1—Yo, in terms of the original items. 

1 

n—-1 

P (n—1) . (2-2-1) (Yx—Yx-1) —(2n—1 — Lo) (Yn—1— Yo) 

e™= 








— (3) 
{ [(n—1) 2 (ae—R~1)?9—(Sp-1—%)"]. 


1 





n—1 
[(n—1) z (yx—Yx-1)?— (Yn-1—Yo)’] 


Assuming z,_,;—2, and y,.,;—¥Y. to be negligible in compar- 
ison with other terms of the function (called assumption a) 


’ we will discard the former. The function apg 
n—1 n—-l —1 


hum 2 z LRYK — (LoYotLn—1Yn—1) — ( . a . xYK-1) 





(4) 





f Vez . tk (xo+2" n—1 as fos 





m—l . 
V2 z yx—(ye+y2-1) —2 z eetteal } 


Assuming that the ities items of each series are random in 
order, first with respect to the adjacent items of the same series 
and, second, with respect to the items adjacent to the con- 
current item of the other series, that is assuming 


a—l n—1 n—1 n—1 
> te-1YK, = XKYK-1, 2 CxXx~-1, and = yxyx-; all equal zero 
1 1 1 1 


(called assumption b) we have 
n-1 


2 + TKYK— (LoYot+Fn—1Yn—-1) 





(5) 





er [22 5 rk —(a2-+22_,)I[2 v2-(We+y2v)] 
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Assuming 2,Y%o+2n—-1Yn-1, +22-; and y2+y2_, to be neg- 

ligible in comparison with other terms in the function (called 

assumption c) we have, discarding the three terms named, 
n—1 


2 tx YK 
o 





r = — ge ell approximately, if r=y=0 


2 tk. 2 yk 
o o 
That is, if at any time assumptions a, b, and c hold true for 
any series of multiple differences, the coefficient of correlation 
for their first differences will equal the coefficient for the items 
from which the first differences were derived. If the same 
assumptions (a, b, and c) hold true for the series of first differ- 
ences then r,=7’,. Further if the assumptions hold true re- 
peatedly for successive differences from and after the pth, 
we have (m>p) 
oP ar) ep PHD 

It is obvious then, that the coefficients of correlation for suc- 
cessive differences will remain stable if assumptions a, 6, and 
c hold true for successive differences. The condition (assump- 
tion that a, b, and c hold true repeatedly) is sufficient to pro- 
duce stability, but is it necessary? 

Consider form (4) of the function r,. If the two terms, 
n— n—1 
2 TxtK-1 and 2 yxyx-1, appearing in the denominator are 

1 
n—1 n—1 
negative in sign, if the expression ( 2 2g_,yxn-+ 2 2xyx-)) is 
1 1 


n—1 
opposite in sign to 2 zxyx, and if the terms are of appropri- 
1 


ate size as well as sign, if these assumptions hold true, 


n-1 n—-1 n—1 n-1 


dropping 2 LelK-1; z YxYK-1, and ( z te-1YK + z xeYxK-1) 


will not affect the value of r,. Also r, will approximately 
equal r,. In other words the fulfilling of the assumptions 
named will result in stable coefficients for successive differ- 
ences. It is my contention that for a moderate number of 
items (n=35 or 40) the conditions here specified are apt to 
occur and be the cause of any stability of the coefficients of 
correlation between multiple differences. 





616 American Statistical Association. [52 


TABLE VI. 


COEFFICIENTS OF CORRELATION BETWEEN THE BUSINESS BAROMETER AND THE 
CLEARING INDEX FOR THE UNITED STATES, FIRST TO SIXTH DIFFERENCES, 
CONCURRENT AND LAG AT ONE AND TWO ITEMS IN EITHER DIRECTION, 1879- 


1913. 








Coefficients of Correlation. 








— .35 
— .26 
—.19 
— .06 
+.11 











The results of applying the variate difference correlation 
method to the business barometer and the clearing index 
(not corrected for prices) are tabulated in Table VI. The 
first to sixth differences were used and the items were lagged 
in both directions. The stability of the coefficients for con- 
current items second to sixth differences (r¥, ri, r'*, r?, r) 
and the instability of the coefficients for one item lag in the 


business barometer (rf, to hi) are noticeable. Let us investi- 
si 


gate the stability of r* and r® and the instability of r#, and 
r4,. Using from (4) we have these values: 








r* uw 


o 





222 RY K + 888 


(2,Y,+2,_Yp—1) 
(Zo _ Ya t+Zt_VK_,) 
2zz% 

(2?+2? _.) 

22igTR_; 

2zy, 

(y2+y?_.,) 
22UKYR-1, 





- 9 
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Computing r™, r* and rt and r4, in two ways, by form (4) 
and form (5), we have the following results: 








Form (4). Form (5). 


























The stability of ré* and r® is explainable by the balancing effect 
of items (22x—Wxt2rxyx-1); Lrxtx-; and Lyxyx-, the in- 
stability of ri‘, and r4,, is explainable by the lack of balance 
between hen items as they appear in the function. The 
items named are not approximately zero, in the case of 
r* and r®, as “Student” assumes, but numerically as large as 


Er KYK: 
TABLE VII. 


COEFFICIENTS OF CORRELATION BETWEEN CONCURRENT ITEMS OF BUSINESS 
BAROMETER AND PRICES, RAILROAD GROSS EARNINGS, CORRECTED CLEARING 
INDEX AND SURPLUS RESERVES OF NEW YORK BANKS COMPUTED (I) BY EXACT 
FORMULA, (II) BY DISCARDING (2p —ZR_)) AND Z(¥e—Yx_)) AND (III) BY 


DISCARDING Pox _ VK 2 KY K_}: Pipl} AND 2YxVK-1 FROM FOR- 








R. R. Gross Ear. Corr. Cl. Index. | Surplus Reserves. 








|H¢a) III(b)} I | Ita} II{(b)} I | HGa) EAT) II (a) | ITI(b) 











BASRRSZeE 


+4+4+4+444+ 
IPBRBRES 
F+++4+44+ 
IRVSRRBE 
++++++4+ 
aoe 
++t+++++ 
PHSRWZeE 
++++4+4 
ret 1-44 
+++4+4++4 
ig bp bo bo b9 Ge M3 
ttt+4+++. 
Res 




















(a) The results in columns UT sslngide wi Genk cheans | eee dooms of the variate differ- 
ences are approximately zero. See Table V 

Et hel onaanbetimdedunes II of a lower order of difference. Approxi- 
mate steadiness is reached not because the terms discarded are approximately sero but because they “ bal- 
—~ - oe that the terms discarded are large is given by the coefficients of correlation for a lag 
in Table IV. 
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Table VII gives the results of applying, (I) the exact for- 
mula, (II) the formula obtained by assuming 2(2,—2Zx-_;) =0, 
2(¥x—-Yx-1) =0 and (III) the formula obtained by assuming 
that the terms (22,~-\yx—2@xyx-1), UXeFR~-) and Lyxyx-) 
balance and therefore may be discarded, of applying these 
formulae to the business barometer and each of the series, 
wholesale prices, railroad gross earnings, corrected clearing 
index, and surplus reserves of New York associated banks. 
Form II gives coefficients practically identical with those 
resulting from the exact Form I showing that the assump- 
tion involved can safely be made. Table VIII illustrates the 
negligibility of the algebraic sums by comparing the algebraic 
sums of the second differences of various series with the abso- 
lute sums of the same items. Form III of Table VII gives 
coefficients which are equal to those for preceding differences 
as would be expected. 


TABLE VIII. 


ALGEBRAIC SUMS AND ABSOLUTE SUMS OF THE SECOND DIFFERENCES OF SIX SERIES 
WITH RATIO OF FORMER TO LATTER. 








Absolute 
Sum. 


Second Differences of: - ” 





0 
-11 
be 4 
Corrected clearing index —6 
Surplus reserves of New York banks +2 

















Consideration of the effects that assumptions a, b, and c 
have upon the value of the coefficients for multiple differences, 
leads to the conclusion that assumptions a and ¢ are in accord- 
ance with the facts and that making them will not, for series 
of 35 or more items, in general affect the coefficient of correla- 
tion by more than .01. Assumption b, however, does not 
usually hoid true. Nevertheless, stability is frequently 
secured by a balance between the terms appearing in numerator 
and denominator. Instability of the coefficients is explainable 
by lack of such balance. The fulfilling of assumption c is 
sufficient but not necessary to secure stability. Moreover, 
stability is usually secured when assumption c does not hold 
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true. In view of this conclusion, what significance has the 
variate difference correlation method for short economic time 
series? This question will now be considered. 


IV. 


The items of annual time series of economic data may be 
conceived to be constituted of the following elements or com- 
ponent parts: 

First, the secular trend or growth element due to the in- 
crease of population and development of industry; 

Second, cyclical fluctuations, extending over a number of 
years and having a greater or less degree of periodicity, due 
to the alternating periods of business prosperity and depres- 
sion; 

Third, irregular fluctuations from year to year due to the 
influence of accidental or, at any rate, unpredictable events 
such as inventions, striking changes in fashion, or war. 


FIGURE 1 











TIME IN YEARS 


A—REPRESENTS ECONOMIC DATA. 
B—REPRESENTS CYCLICAL FLUCTUATION. 
C—REPRESENTS SECULAR TREND. 
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An idealistic representation of a time series of economic 
data containing the three elements, secular trend, cyclical, 
and irregular fluctuation is given in Figure 1. For particular 
data the secular trend may, of course, be other than a straight 
line and the cyclical fluctuations other than the simple sine 
curve. 

The concept of secular trend or normal growth element for 

which I contend is that of an element which increases or de- 
creases regularly according to some principle. That principle 
may be linear, 7. e., the addition of a constant amount in each 
time interval as assumed in Figure 1, or it may be the com- 
pound interest law, the addition of an equal percentage in 
each time interval, or it may be a second degree parabola, 
or some other law. In any case, however, the mathematical 
function assumed to represent the secular trend should not 
“fit”’ the cyclical or irregular fluctuations of the data. Now 
the process of taking multiple differences is equivalent, on the 
assumption that the series is an algebraic function of time, 
to reducing the degree of the function of ¢ by one for each 
difference taken. Thus, if the secular trend be linear, if a 
straight line should be fitted to the data and if the deviations 
of the original series from the corresponding ordinates of the 
straight line are taken (the cycles), then the coefficient of 
correlation between the first differences of the original items 
and that between the first differences of the cycles are identi- 
cal. This theorem may be proven as follows: 
ES ey be the original series of n items 
and let x= mt+b be the line of secular trend. The first differ- 
ences are of the form zx—2g-_;. The cycles are of the form 
Zx—(mtx+b). The first differences of the cycles are of the 
form 2q—Zx~-)—M(tg—tx_;). Since tg—tg_; is a constant 
for all values of K (being the time interval for which items 
are taken) the first differences of the cycles differ from the 
first differences of the original series by a constant. Hence 
the coefficients of correlation obtained from them are iden- 
tical. 

The coefficient of correlation between second differences is 
identical with that found between the corresponding devia- 
tions of items from the smoothed curve obtained by taking 
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three-year averages.* This theorem may be proven as fol- 
lows: 

Let x, 2%, %2 . . + %,-; be the original series. The 
second differences are of the form 2,4;—22,¢-+2x-;.. The 
deviation of any items from the three-year average is 


— tes ttetixns 4 —Feyit2ee—Fe-1 THe form just 
3 ; 3 

given is obtainable from the form for second differences by 
multiplying by —4. Therefore the coefficients of correlation 
between second differences are the same as those between 
deviations from three-year averages.f 

* Moore has correlated deviations from three-year averages of crop yield per acre (weighted index 
of nine leading crops) and production of pig-iron. He does not appear to realise that his coefficients 
are the same as those for second differences. The coefficient for crops and pig-iron production with 
a one-year lag of the latter (7, 1) for the period 1870-1911 is .254; for crope and general prices 
for the same period the coefficient (7,) is 208. See Moore's Economic Cycles, pp. 107, 118. 


t (Theorem for form of mth difference) 
Theorem: The mth difference of the terms of a series 
21,2 Zeiss + Bey 





2K 


is of the form 
(m) i —1)™ 
ad 4 =[C.7 x 4m —Cm—12K4+m—1 tC m—27K-+-m—2 - + + +(-1) Ctx] 
where C; indicates the number of combinations of m things taken i at a time. 
The first differences of the series 
1) Zo a i. Zn are Zo; %3—Zo “7 « TK+1 —irg¢ a eS. 12 tT n~-1° 
The second differences are 
%g—2tott, .« - K+ —Wietigiy + + + %—2%,_1+2, 9- 
Assume the mth difference to be of the form 
m(m—1) 1)™ 


m(m—1) 
*+mti— "2 K+mt—To—*K+m—-1— + (D+ 


Taking the difference we have 
m(m—1)—2m 
TKtmti— (M+R i mt Rte 





; | m |m 
+| ———- : —$_—- r ‘ 
Lit |m—t jt—1 ai K+m-—i+1 





Consider the coefficient of the (i+1)th term: 


|m 
————— : —— .which reduces to 
|e |m—t |t—1l |m—i+l 
| m{ (m—i+1)+i a |m+1 = cnt 
| |t|m—i+l |\2 |m+1—i 
Since we have shown that the 2nd differences are expressions with the binomial coefficients corresponding 
to the order of difference and since, assuming that the mth difference appears with the binomial coeffi- 
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These theoretical considerations taken in connection with 
applications and tests of the variate difference method lead 
me to the following conclusions concerning the meaning of the 
coefficients of correlation for the raw figures and their multiple 
differences: ; 


In general, significant coefficients of correlation for the raw 
figures of two time series indicate the similarity of the growth 
elements of the two series, if large growth elements exist. The 
existence or non-existence of such elements is readily deter- 
mined graphically or by fitting a simple function to the data. 

Significant coefficients of correlation for first differences 
indicate that the cyclical fluctuations synchronize, if there be 
cyclical fluctuations. Evidence of such cycles may be secured 
by plotting the deviations from the assumed secular trend. 

Significant coefficients of correlation for second and in some 
cases higher differences indicate, in general, that the irregu- 
lar fluctuations synchronize. Coefficients for higher differences 
of short series contain a large spurious element which increases 
with the order of difference. This element is due to the 
tendency of the items to alternate in sign. 


These conclusions assume that the magnitude of the elements 
due to secular trend, business cycles, and irregular events 
vary in the order in which these elements are named. In 
case we are dealing with data having marked cycles (of the 
same variety) and are interested in the correlation of the cycles 
the coefficients for the cycles and first differences constitute 
the proper basis for judgment rather than coefficients for 
higher differences. Stability of coefficients for higher differ- 
ences, in such a case, probably means that the influence of all 
the fluctuations except the irregular ones, 7. e., the oscillations, 
has been eliminated by the variate difference process. 


esents corresponding to the terms of the expansion of (1+1)”, we have proven that the (m+1)th 
difference has as coefficients the terms of the expansion of (1+1)"+! then, by mathematical induction, 
the proposition stated is true. 

Professor A. A. Young has called my attention to the fact that the foregoing theorem is demon- 
strated in the Institute of Actuaries’ Text Book, 2nd ed., Part II, p. 427. 
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TABLE IX. 


LONDON CLEARINGS (COLUMN 1) WITH THE 9-YEAR MOVING AVERAGE (2) AND DEVIA- 
TIONS FROM MOVING AVERAGE (3) STRAIGHT LINE (4), PARABOLA (5) AND COM- 
POUND INTEREST LAW (6); SAUERBECK’S PRICES (7) WITH THE 9-YEAR MOVING 
AVERAGE (8) AND DEVIATIONS FROM MOVING AVERAGE (9), STRAIGHT LINE (10) 
AND PARABOLA (11); 1868-1913. ALSO DEVIATIONS OF LONDON CLEARINGS FROM 
THE TWO STRAIGHT LINES FITTED TO DATA FOR 1868-1896 AND FOR 1897-1913 
(12) AND DEVIATIONS OF SAUERBECK’S PRICES FROM TWO STRAIGHT LINES 
SIMILARLY FOUND (13). 
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(a) In £100,000,000. (b) Relative Indices. 


Judgment concerning the correlation of cyclical fluctuations 
of two series must be preceded by elimination of the secular 
trend. The choice of a function to represent the secular trend, 
indeed the choice of the method of eliminating the trend, 
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whether by curve fitting or otherwise, these are questions 
fundamental to the process. I will test the effect that various 
suppositions concerning the secular trend have upon the corre- 
lation coefficients between the deviations from the various 
trends (cycles) resulting from the suppositions. The two 
series chosen for this test are London clearings and Sauer- 
beck’s price indices given with other data in Table IX. These 
series are chosen, first, because the secular trends are dis- 
-similar, second, because the trends differ most widely from 
the linear of any which could be found, and, third, because 
the variate difference correlation coefficients for these series 
are puzzling to the author of the variate difference correla- 
tion method. “Student” applied his method to London 
clearings per capita and Sauerbeck’s prices and to marriage 
rate and wages, finding the following coefficients: 


I, CLEARINGS AND PRICES. 











| 


+.51 +.30 | +.07 | +11 | +.05 


| 
Ist D. | andD.| = 3rd D. | 4th Dz | 5th D. | 6th D. 
| 
| 





II. MARRIAGE RATE AND WAGES. 





+.67 | +58 | +52 | +.55 


| 








He says, “‘The difference between I and II is very marked, 
and would seem to indicate that the causal connection between 
index numbers and Bankers’ clearing house rates is not al- 
together of the same kind as that between marriage rate and 
wages, though all four variables are commonly taken as in- 
dications of the short period trade wave.’’* 

Figure 2 presents Sauerbeck’s indices with linear and para- 
bolic secular trends, the functions being fitted to the data 
by the method of moments. Figure 3 presents London bank 
clearings with linear, parabolic, and exponetial functions 
fitted to the data. Figure 4 presents both series with their 
respective nine-year moving averages, nine years being deter- 


* Biometrika, April, 1914, p. 180. 
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mined by inspection as the length of the business wave or 
cycle. Figures 5, 6, 7, and 8 present the deviations, positive 
or negative, of the two series from the various secular trends. 
The last named figures all show a striking correspondence of 
the cyclical fluctuations of the two series. It will be noticed 
that fluctuations of clearings show a tendency to precede or 
forecast the fluctuations in prices. 

Figures 5, 6, 7, and 8 throw some light on Babson’s hypothe- 
sis that economic action and reaction are equal, 7. e., that 
consecutive areas above and below the line of normal growth 
should be equal for a correct normal line. It is true that the 
sum of areas above and below the lines are roughly equal; that, 
the method of curve fitting accomplishes. But what are con- 
secutive areas, the long-time areas of Figures 6 and 8 or the 
short-time areas of Figures 5 and 7? It is obvious that we 
would get still more heterogeneous results, as regards positive 
and negative areas, if we should use series of various lengths, 
say 20 years or 70 years, or series of monthly or quarterly 
rather than annual data. 


FIGURE 2. 











SAUERBECK'S INDEX NUMBERS OF WHOLESALE PRICES, 
1868-1913, WITH STRAIGHT LINE AND PARABOLA 
FITTED TO DATA. EQUATIONS OF LINE AND CURVE: 





(a) y © -.638 t+ 76.25 
(B) y * +.051 t® = .561 t + 70.18 








(Origin at 1891) 
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FIGURE 3 








LONDON SANK CLEARINGS IN HUNDREDS OF WILLIONS POUNDS STERLING, 
1868-1913, WITH STRAIGHT LINE, PARABOLA, AND COMPOUND INTEREST 
r CURVE FITTED TO DATA. EQUATIONS OF LINE AND CURVES: 





(a) y * 42.31 t + 61.5 





(B) y = +.0744 t® + 3.39 t + 67.3 
(c) y ® (75.4) (2.033)* 








(Origin at 1891) 
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FIGURE 4 
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SAUERBECK’S PRICE INDICES (P) AND LONDON CLEARINGS (C), 1868-1913, WITH 
THEIR RESPECTIVE NINE-YEAR MOVING AVERAGES, 1872-1909. 
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FIGURE 5 


DEVIATIONS ©F LONDON CLEARINGS (C) AND SAUERBECK’S PRICE INDICES (P) 
FROM THEIR RESPECTIVE NINE-YEAR MOVING AVERAGES AS 
SECULAR TRENDS, 1872-1909. 


FIGURE 6 


DEVIATIONS OF LONDON CLEARINGS (C) AND SAUERBECK’S PRICES (P) FROM 
THEIR RESPECTIVE LINEAR SECULAR TRENDS, 1868-1913. 
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FIGURE 7 
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DEVIATIONS OF LONDON CLEARINGS (C) AND SAUERBECK’S PRICES (P) FROM 
THEIR RESPECTIVE PARABOLIC SECULAR TRENDS, 1868-1913. 


FIGURE 8 


DEVIATIONS OF LONDON CLEARINGS FROM TREND AS COMPOUND INTEREST 
CURVE (C) AND AT SAUERBECK’S PRICES FROM LINEAR TREND (P), 1868-1913. 
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TABLE X. 


COEFFICIENTS OF CORRELATION BETWEEN SAUERBECK’S PRICE INDICES AND 
LONDON CLEARINGS, 1868-1913. 


A. Raw Figures anp THe Dirrerences. 

B. DzviaTions From 9-yzar Movine Averace anp Dirraxr~ces. 

C. Deviations rrom Srraicut Lines anp Dirrerences. 

D. Deviations rrom ParaBoias AND DIFFERENCES. 

E. Deviations From Compounp Inrerest Law ror CLEARINGS AND ErraicuT Live ror Prices aND 


DIFFERENCES. 
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The main question upon which we wish to get light is, how- 
ever, the effect of the various methods of eliminating the 
secular trend upon the coefficients of correlation between cor- 
responding deviations. Table X gives the coefficients of 
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correlation between Sauerbeck’s indices and London clearings 
taking the raw figures and their differences, first to sixth, and 
also taking deviations from various secular trends and their 
differences, first to third. Coefficients are presented for con- 
current items and for a lag in both directions, a lag of one for 
differences and a lag of one and of two for deviations or cycles. 

The coefficients of correlation for the raw figures (—.37, 
—.31 and —.28) show that the secular trends of prices and 
clearings are in opposite directions. The coefficients for the 
first differences of the raw figures and of all the deviations 
indicate an appreciable positive correlation for concurrent 
items (r’,) and for prices one-year lag (r’+;), with the coefficient 
r’, larger. The coefficients for second and higher differences 
of the raw figures, and deviations as well, decrease as the order 
of difference increases; the coefficients for one-year lag of prices 
decreasing more rapidly than for concurrent items. These 
facts indicate that the maximum correlation of business cycles 
(including the irregular fluctuations) is for clearings preceding 
prices by less than half a year, say, four months. There is, 
however, an unknown element of spurious correlation between 
clearings and prices because the former are dependent upon 
prices as well as physical volume of goods exchanged and 
speculation. If this spurious element, due not to the method 
but to the nature of the data, were excluded, it is probable 
that the maximum correlation would be found for clearings 
preceding prices by more than six months, perhaps by a year. 

The coefficients of correlation for the deviations all agree in 
locating the maximum, and therefore the lag of prices, at less 
than a year. The actual maximum found was for concurrent 
items, except for deviations from the nine-year averages which 
gives a maximum at one year lag of prices. Since our judg- 
ment is based upon the relative values of the coefficients for 
various degrees of lag, rather than upon their absolute values, 
the type of secular trend chosen does not appear to have great 
significance. Curve-fitting, however, does appear to be prefer- 
able to the taking of moving averages because, first, all the 
items may be used in determining the correlation and, second, 
the coefficients for deviations and first differences disagree in 
their location of the maximum when deviations from the mov- 
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ing average are taken but agree in all other cases. Of course 
it might be possible to use the deviations of all the terminal 
items from the moving average, if values of the latter were 
exterpolated; fitting a curve, graphically or otherwise, to the 
moving average is the obvious solution of this problem. 

The results appearing in Table X make it clear that the 
coefficients for higher differences give little indication of the 
correspondence of the business cycles in the two series, which 
correspondence is clearly shown by the charts and the correla- 
tion coefficients for deviations from the secular trend and for 
first differences. 

V. 

If our interest were in the absolute degree of correlation 
between the cycles of two series for selected pairs of items, 
the nature of the curves used to represent the secular trends 
and the closeness of the fit to the data would be of primary im- 
portance. In case we are dealing with series in which the 
secular trends are non-linear, such as clearings and prices, but 
if, nevertheless, we use straight lines to represent the trends 
and correlate deviations therefrom, the coefficients resulting 
will undoubtedly contain a large spurious element, positive 
or negative. This is illustrated by the discrepancy between 
the coefficients, +.92 and +.73, obtained for deviations from 
straight lines and parabolas, respectively, of London clearings 
and Sauerbeck’s prices (Table X). The former coefficient 
(+.92) undoubtedly contains a spurious element amounting 
to some twenty points. The spurious element is positive in 
this case, apparently, because of the downward long time 
trend from 1868 to 1896 and the upward trend from 1897 to 
1913 which results in the pairing of large negative deviations 
for the period 1884-1899, when the deviations are from straight 
lines. 

To test the effect of dividing the data of Sauerbeck’s prices 
and London clearings into two homogeneous sub-periods, 7. e. 
one of falling prices, 1868-1896, and one of rising prices, 
1897-1913, two linear secular trends were found for each series 
and coefficients of correlation were computed for deviations 
from these trends and for their first differences. The lines 
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and their equations are given in Figure 9; the deviations ap- 
pear in Figure 10; the coefficients are presented in Table XI. 
The maximum coefficients for the deviations are r,= +.71 and 
r4,=+.70 for the periods 1868-1896 and 1897-1913, respec- 
tively. The maximum coefficients for corresponding first 
differences, r’,=+.61 and r’,;=+.47, are consistent with 
those obtained for the entire period as shown by Table X. 
The coefficients for first differences, r’,, between deviations 
from various trends (see A, B, C, D, and E of Table X) are 
+.42, +.42, +.55, +.49, and +.50. 


TABLE XI. 


COEFFICIENTS OF CORRELATION BETWEEN DEVIATIONS OF SAUERBECK’S PRICES 
AND OF LONDON CLEARINGS FROM THEIR RESPECTIVE LINEAR SECULAR TRENDS 
FOR THE TWO PERIODS 1868-1896 AND 1897-1913 TOGETHER WITH COEFFICIENTS 
FOR FIRST DIFFERENCES; VARIOUS DEGREES OF LAG OF PRICES (+) AND OF CLEAR- 


INGS (—). 








Coefficients of Correlation. 





To. | T41- T+9- 





1868-1896. 





| + .63 
+.34 














1897-1913. 





+.43 | +. +.20 
+.27 +. +.04 











Division of the data into two sections throws new light on the 
problem. Clearings and prices fluctuated concurrently during 
the first period, but prices lagged behind clearings by a year 
during the period 1896-1913. Perhaps increased speculation 
has changed the character of clearings during the second period. 
Whatever may be the cause, the fluctuations of English prices 
and clearings are shown to be related in the same fashion as are 
those for the United States during the same period (see AD, 
BD, and CD of Table IV). 
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TABLE XII. 


COEFFICIENTS OF CORRELATION BETWEEN RELATIVE WHOLESALE PRICES AND PIG- 
IRON PRODUCTION OF THE UNITEDSTATES, DEVIATIONS AND FIRST DIFFERENCES 
AS FOLLOWS: 
1879-1913. 


A. Deviations rrom Lingar Secutar Trenps AND First Dirrerences. 
B. Deviations oF Prices FROM PARABOLA AND OF P1G-1roN Propuction From Compounp INTEREST 


Curve, ano Tuer Fist Dirrerences. 


1879-1896. 
C. Deviations rrom Lingar Secutar TRenps AND Terr First Dirrerences. 


1897-1913. 
D. Daviations rrow Livgar Secutan Trenps aNnp Tuem First Dirrerences. 


1879-1896. 
E. Deviations oF Prices rrom Livgar Sacutar TREND, AND oF Pic-rron Propvction From Com- 
PouND Interest Curve (Compurep ror Data 1879-1913). 


1897-1913. 
F. Deviations or Prices rrom Livgar Sscutar TREND, AND or Pia-rron Propvciion rrom Com- 
PounD Inrsrest Curve (Computsep ror Data 1879-1913). 


1879-1913. 
G. Deviations or Prices From THE Two Livear Secutar Trenps (1879-1896 anv 1897-1913) asa 
Continvovs Series AND OF Pia-rron Propuction rrom Compounn Interest Curve. 
H. Fist Dirrerences or Raw Irems. 
Prices concurrent (0), lag (+), or previous (—) to pig-iron production as indicated by subscript of r. 








| Coefficients of Correlation. 





Symbol. Period. Items Paired. 
r+ 1° 








1879-1913 | Deviations (a) 
First Differences 








1879-1913 Deviations 
First Differences 





1879-1896 |Deviations 
First Differences 





1897-1913 | Deviations 
First Differences 























(F) 





| 
() |ss0-t908 [Deviation 


1897-1913 ‘Deviation 





(G) —‘|1879-1913 [Deviations 





(H) 1879-1913 Fiat Ditlerence | -.31 | +41 








(a) Coefficient r4g=+.53. 
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Table XII presents coefficients of correlation between de- 
viations from various secular trends and their first differences 
of relative wholesale prices and pig-iron production of the 
United States for the period 1879-1913 and the sub-periods, 
1879-1896 and 1897-1913. The various secular trends and 
their equations are given in Figures 11 and 12; the devia- 
tions appear in Figures 13 and 14; the data appear in Table 
XIII. 

For the period 1879-1913 (see A, B, G, and H) the cycles of 
prices and pig-iron production are concurrent. For the period 
1879-1896 (see C and E) the pig-iron cycles precede price 
cycles by a year. For the period 1897-1913 the cycles of the 
two series are strongly concurrent (see D and F). The 
coefficients of correlation for deviations from the linear trends, 
1879-1896, r,= +.48 and r4,;=+.61 and r4,=+.41, agree in 
locating the maximum at the same point as those for first 
differences, r’,= +.23 and r’4,;=+.55 and r’4,=.00 (see C). 
The coefficients for deviations from the linear trends, 1897- 
1913, r,= +.41 and r;,;= +.35 are likewise supported by those 
for first differences, r’,= +.54 and r’4;=+.17 (see D). Using 
deviations of prices from the two linear secular trends as a 
continuous series and of pig-iron production from the com- 
pound interest curve, 1879-1913, we have the coefficients 
r,= +.45 and r4;= +.44 (see G). It is obvious that the coef- 
ficients for the whole period and the two sub-periods are con- 
sistent. At present general prices and pig-iron production 
fluctuate concurrently. 
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FIGURE 9. 





' 
GAUERBECK'S INDEX SUMBERS OF WHOLESALE PRICES (P) 
ABD LONDON BANE CLEARINGS (C), 1868-1913, ITH THO 
STRAIGHT LINES FITTED TO EACH SERIES, 1966-1696 AND 
1897-1913, RESPECTIVELY. EQUATIONS OF LINES: 
PERIOD PRICES CLEARINGS 


1966-1896 | (A) y = -1.64t*68/(B) y * 41.00% + 69 
1897-1913 | A") y © *2.16t +57] (BY) y « *5.31t + 43 
(Origin at 1691) 







































































OEVIATIONS 





+12 
































ih 





















































+4670 as 


DEVIATIONS OF LONDON CLEARINGS (C) AND SAUERBECK’S PRICE INDICES (P) 
FROM THEIR RESPECTIVE LINEAR SECULAR TRENDS FOR THE 
TWO PERIODS 1868-1896 AND 1897-1913. 
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FIGURE 11. 





WHOLESALE PRICE INDICES FOR THE UNITED STATES WITH STRAIGHT 
LINE AND PARABOLA FITTED TO DATA, 1879-1913, AND TWO STRAIGAT 
LINES FITTED TO THE DATA FOR THE PERIODS 1879-1896 AND 1897- 
1913, RESPECTIVELY. EQUATIONS OF LINES AND CURVE: 
1879-1912 (A) y = +0.1475t + 113.9 
1879-1913 (B) y = +0.128¢* + 0.189t + 102.5 
1879-1896 (C) y » -3.10t + 131.7 
897-1913 (D) y = +6.60t * 49.5 


(Origin for Bat 1894] 
A, C & D at 1879) 
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FIGURE 12. 








PIG-IRON PRODUCTION OF THE UNITED STATES ¥ITH 
STRAIGHT LINE AND COMPOUND INTEREST LAW FITTED 
TO DATA, 1879-1913, AND TWO STRAIGHT LINES 
FITTED TO DATE FOR THE PERIODS 1879-1896 AKD 
1897-1913, RESPECTIVELY. 

EQUATIONS OF LINES AND CURVES: 
1879-1913 (A) y = +7.85t = 3.00 
1879-1913 (B) y = 33.75 (1.069)* | 

4 








1679-1896 (C) y = +3.59t + 32.0 

1897-1913 (D) y = +11.90t - = 
(Origin at 1879) o* 
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FIGURE 13. 
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DEVIATIONS OF UNITED STATES PRICE INDICES (P) FROM PARABOLA AND PIG- 
IRON PRODUCTION (I) FROM COMPOUND INTEREST CURVE, 1879-1913. 



































FIGURE 14. 
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DEVIATIONS OF UNITED STATES PRICE INDICES (P) AND PIG-IRON PRODUCTION 
(I) FROM THEIR RESPECTIVE LINEAR SECULAR TRENDS FOR 
THE TWO PERIODS, 1879-1896 AND 1897-1913. 
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TABLE XIII. 


WHOLESALE PRICE INDICES FOR THE UNITED STATES (COLUMN 1) WITH THE DEVIA. 
TIONS FROM STRAIGHT LINE (2) AND PARABOLA (3) 1879-1913, AND FROM TWO 
STRAIGHT LINES, 1879-1896 AND 1397-1913 (4) AS SECULAR TRENDS. ALSO PIG-IRON 
PRODUCTION IN THE UNITED STATES (5) WITH THE DEVIATIONS FROM STRAIGHT 
LINE (6) AND FROM COMPOUND INTEREST LAW (7) 1879-1913 AND FROM TWO 
STRAIGHT LINES, 1879-1896 AND 1897-1913 (8) AS SECULAR TRENDS. 


r (Equations of Secular Trends in Figures 11 and 12.) 
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ae ee ae and Bureau of Labor Statistics indices are reduced to a continuous series with the 
(b) From Statistical Abstract of the United States, 1914, p. 664. The units here are 100,000 long tons. 
(c) Equation of line of secular trend, y=7.852t—3.00, origin at 1879. Thisis the only trend having 
any negative ordinate for the period studied. 


The conclusion just stated is of especial interest because it 
is in conflict with that of Professor H. L. Moore.* In his 
Economic Cycles, he found the following coefficients between 
the cycles of crop yield and pig-iron production, using three 
year averages in all cases: r,= .625; r4,=.719; r4g=.718; r4g= 
.697; r44=.572 (see Table XIV). Correlating the cycles of 
crop yield with cycles of general prices, he obtains the coeffi- 


* Moore, H. L. Economic Cycles, p. 110. 
t Ibid., p. 122. 
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cients r43 = .786, r44=.800, andr,;=.710. He concludes from 
these coefficients, first, ‘that the cycles in the yield per acre 
of the crops are intimately related to the cycles in the activity 
of industry, and that it takes between one and two years for 
good and bad crops to produce the maximum effect upon the 
activity of the pig-iron industry” and, second, that “the 
cycles in the yield per acre of the crops are ‘ 
intimately connected with the cycles of general prices, and the 
lag in the cycles of general prices in approximately four years.’’* 
It seems to me that this conclusion is not warranted because of 
the poor fit of a linear secular trend to pig-iron production. 
The ordinate of the secular trend is negative for the years 1871, 
1872, and 1873. The deviations from the secular trend are all 
positive for the periods 1871-1877 and 1902-1910 and all 
negative for the period 1878-1901. The deviations of crop 
yield are, with few exceptions, positive from 1871 to 1879 and 
1903 and 1910 and negative from 1880 to 1902.t It appears 
probable, then, that the correlation coefficients upon which he 
relies contain a large spurious element. At any rate the 
differences between the coefficients, amounting to less than .02 
in most cases, on which his judgment is based, cannot be con- 
sidered significant. 

Waiving the question of Moore’s use of three-year progres- 
sive averages to form the series for which correlation coeffi- 
cients are computed, which usage throws serious doubt upon 
the reliability of his conclusions, I will test the correlation 
between the series of three-year averages by computing the 
coefficients between first difference of those items. The 
coefficients are given in Table XIV. The coefficients between 
(1) crop-yield and pig-iron production and between (2) crop- 
yield and general prices are not significant. The former group 
coefficients (1) shows a curious alternation in value which, 
examination of the basic series demonstrates, is due to a few 
predominating items in pig-iron production after 1905. The 
latter group of coefficients (2) shows a maximum at four-years 
lag of prices but the coefficient (r’44= +.39) is not much larger 
than the maximum coefficient (r’.,.=+.32) found for first 
differences of the two random series previously analyzed (see 


* Moore, H. L. Economic Cycles, pp. 110, 122. 
t Ibid., p. 131. 
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Table V). The coefficients for pig-iron production and general 
prices (3) reach a maximum (r’,,;=+.59) at one year lag of 
prices. This coefficient is probably significant. It gives, 
however, a result at variance with Moore’s conclusions but 
consistent with the conclusions obtained when the period 
1879-1913 was divided into two sub-periods (see Table XIII). 
Moore’s use of three-year averages probably results in much 
higher coefficients than would result from annual figures. 
Even so, the coefficients between first differences are small for 
crops and the indices of the industrial and business activity. 
Moore’s object was to show that the cycles of business reflect 
the cycles in crops, not merely, having assumed that cycles 
exist, to find the lag. For this object a good “fit” of secular 
trend to data is imperative. The method of first differences, 
then, is valuable because it reveals spurious correlation between 
deviations from secular trends when the fit is not good. 


TABLE XIV. 


{A) COEFFICIENTS OF CORRELATION BETWEEN DEVIATIONS OF YIELD PER ACRE OF 
NINE CROPS FROM THE LINEAR SECULAR TREND AND SIMILAR DEVIATIONS OF 
PIG-IRON PRODUCTION AND OF GENERAL PRICES, 1870-1911; ITEMS OF THE VARIOUS 
SERIES ARE THREE-YEAR PROGRESSIVE AVERAGES. 

{B) COEFFICIENTS OF CORRELATION BETWEEN FIRST DIFFERENCES OF THE RE. 
SPECTIVE DEVIATIONS. 

The subscript i in r; indicates the lag in prices and pig-iron production compared with crops, or of prices as 


¢ ed with pig-iron. 
— ” (A) Moore’s Coefficients. (a) 








Crop Yield Correlated with: To- 41. T4g- " e | +6. 
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(B) Coefficients between First Differences. 





Crop Yield Correlated with: r’ 0- 





(1) Pig-Iron Production 
(2) General Prices 








Pig-Iron Production Correlated with: 

















(3) General Prices 











(a) Moore, H. L. Economic Cycles, pp. 110-122. Series on pp. 131, 134. 
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Résumé. 


The variate difference correlation method has been in- 
vented to eliminate spurious correlation due to position of 
items in time or space. 

The method involves the assumption that the taking of 
multiple differences leads to series of random variates. In 
practice for short series this assumption is not fulfilled. 

Coefficients for higher differences of short series tend to al- 
ternate in sign and to conceal rather than to reveal the nature 
of the correlation between the series being tested. 

Stability of coefficients for higher differences appears to 
have little significance for short series, and perhaps for long 
series as well. The assumption that the series correlated are 
made up of variates “randomly distributed in time,” if ful- 
filled, will lead to stable coefficients for successive differences. 
However, though this condition is sufficient for stability it is 
not necessary. 

In testing economic series for correspondence of their 
cyclical fluctuations, especially in determining the relative 
position of the cycles upon the assumption that there are 
cycles, the correlation coefficients between deviations from a 
linear secular trend together with coefficients for first differ- 
ences constitute a reliable basis for judgment. 

When the question is one of the existence or non-existence 
of similar cycles in two time series great care must be used in 
the choice of the function used to represent the secular trend 
and in the nature of the fit of the curve or line to the data. The 
method of first differences is an extremely valuable aid in 
investigating such a question. 

Coefficients of correlation between second differences may 
give information concerning minor oscillations as distinct from 
secular trend and major cycles. Even for this purpose the 
use of higher than second differences appears to be unreliable, 
especially so for short series. The coefficients of correlation 
between second differences are identical with those between 
deviations from three-year progressive averages. 

The method of measuring correlation between cycles of time 
series, that is both easy of application and reliable, is the 
method of first differences. In general, however, this method 

6 





642 American Statistical Association. [78 


should be supplemented by curve fitting. To secure a picture 
of the cycles, it is, of course, necessary to take deviations from 
a closely fitted curve. 

Finally, curve fitting to eliminate the secular trend of a time 
series should always be adapted to the problem in hand and 
interpretation of coefficients of correlation between time 
series should be made with continual reference to the funda- 
mental data. Important light may be secured by dividing 
statistical series into more homogeneous sub-series and analyz- 
ing the latter. The nature of the data is as important as 
the method to be applied. Rules-of-thumb concerning 
method or data are apt to lead to pitfalls. 
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SOCIAL-ECONOMIC -GROUPS OF THE UNITED 
STATES. 


GAINFUL WorKERS OF UNITED STATES, CLASSIFIED BY 
SocraL-Economic Groups OR STRATA. 


By AuBa M. Epwarps, Ps.D., Washington, D. C. 


In reporting the occupations of the gainful workers of the 
United States, it has been the custom of the Bureau of the 
Census to group the occupations into a few general divisions, 
each general division, as agriculture, manufactures, etc., con- 
stituting a large section of the broad field in which gainful 
labor is occupied. No attempt has been made by the Bureau 
of the Census to group the gainful workers according to social- 
economic groups or strata. Yet, there is a real need for such an 
additional grouping, for while much of our discussion and much 
of our labor legislation deals with the workers in a certain 
section or in certain sections of the industrial field, as persons 
engaged in agriculture, persons engaged in manufacturing and 
mechanical industries, etc.; another large part of our discus- 
sion—if not as yet of our legislation—deals with large social- 
economic groups, as proprietors, skilled workers, laborers, 
servants, professional persons, etc., with but minor regard to 
the section of the broad industrial field in which the workers 
in each respective group are occupied. Those discussing or 
desiring to discuss such social-economic groups have been 
hampered by the lack of any such grouping of the workers 
reported in the Thirteenth Census report on occupations. In 
the following pages such a grouping is presented. 

There are those who desire a grouping of occupations accord- 
ing to skill. In many respects such a grouping, if it could be 
carried out, would be an admirable and a useful one; but a 
complete grouping of occupations according to skill is impos- 
sible, since many occupations do not lend themselves to such a 
grouping. For example, proprietors usually are distinguished 
from the other workers in the same industry or business, not 
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by a difference in skill but rather by a difference in the posses- 
sion of property, credit, and business and executive ability. 
Where, in a scale of skill, would we place the policeman? 
Possibly we might, by a stretch of the imagination, classify 
according to skill the surgeon, but not the physician; the acro- 
bat, but not the actor; the sculptor, but not the artist; the 
organist, but not the opera singer. Indeed, none of the pro- 
prietary, official, managerial, clerical, or strictly professional 
pursuits lends itself readily to a classification by skill; and it is 
doubtful whether any of them may be properly so classified, 
since in none of them is skill the chief characteristic. In fact, 
in a grouping such as here presented, we can properly classify 
according to skill only those occupations in which the expendi- 
ture of muscular force is one of the chief characteristics. It is 
impossible, of course, to draw a hard and fast line between 
those occupations which are characterized chiefly by the 
exercise of muscular force or manual dexterity, and those 
which are characterized chiefly by the exercise of mental force 
or ingenuity. In other words, it is impossible to draw a hard 
and fast line between the hand workers and the head workers. 
But such a line may be drawn sufficiently accurately for our 
purpose. 

The grouping of the gainful workers here presented is not 
according to skill, except in the case of the manual workers, 
whose occupations lend themselves more or less readily to a 
classification by skill. The aim has been merely to divide 
the gainful workers of the United States into a few large, social- 
economic groups. 

The grouping given below is the result of a rearrangement of 
the occupations and occupation groups of Table I of the Thir- 
teenth Census report on occupations. The occupations of 
Table VI of the same report are in much greater detail and for 
this reason possibly could have been grouped with a higher 
degree of accuracy, but it is not believed that the added accu- 
racy would compensate for the great additional amount of 
labor involved. The occupations of Table I were especially 
preferred for the reason that, since the occupations are classi- 
fied in the same manner for each state, in Table II, and for 
each city of 100,000 population and over, in Table III of the 
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occupation report, it will be easy for any one to group in the 
manner here presented the occupations of any state or of any 
city of 100,000 population and over. And, with some estimat- 
ing, a similar grouping may be made, from Table IV of the 
occupation report, of the occupations of any city of 25,000 to 
100,000 population. Furthermore, following the classification 
of Table I has made possible a grouping of the occupations of 
the Negroes, since their occupations are classified in Table 2 
of the Negro bulletin* in accordance with the classification of 
Table I of the occupation report. 

The occupations of Table I of the Thirteenth Census report 
on occupations have been rearranged into the following nine 
groups: : 

I. Proprietors, officials, and managers. 
II. Clerks and kindred workers. 
III. Skilled workers. 
IV. Semiskilled workers. 
V. Laborers. 
VI. Servants. 
VII. Public officials. 
VIII. Semiofficial public employees. 
IX. Professional persons. 

To any one at all familiar with occupations and occupation 
classification it is hardly necessary to point out the impossibil- 
ity of grouping the 38,000,000 and over gainful workers of 
the United States into nine groups and making each group 
perfectly clear cut and distinct. Each of the above groups 
doubtless includes some workers who properly belong in 
another group, and from each group doubtless are omitted 
some workers who properly belong there. However, it is 
not believed that such additions and omissions are large enough 
to affect materially the percentage reported. 

In Table I, each general division of occupations, as reported 
in Table I of the occupation report, is shown, with its occupa- 
tions rearranged according to the grouping given above. Table 
I, by showing the occupations included in each group, renders 
a special explanation of each group unnecessary. However, 
since there is no unanimity of opinion among statisticians and 


* “Negroes in the United States,’’ Bureau of the Census, Bulletin 129. 
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others as to which occupations are skilled, which semi-skilled 
and which unskilled, it may be well to state briefly the rules 
according to which occupations were assigned, respectively, 
to Group III, Skilled workers, to Group IV, Semiskilled work- 
ers, and to Group V, Laborers. 

As stated in a preceding paragraph, the term skill, for the 
purposes of a grouping such as here presented, is considered 
properly applied only to those occupations in which the ex- 
penditure of muscular force is one of the chief characteristics. 
Within this field, those occupations have been considered 
skilled for the pursuance of which a long period of training or 
an apprenticeship usually is necessary, and which in their 
pursuance call for a degree of judgment and manual dexterity, 
one or both, above that required in semiskilled occupations. 
Those occupations have been considered semiskilled for the 
pursuance of which only a short period or no period of prelim- 
inary training is necessary, and which in their pursuance 
call for only a moderate degree of judgment or of manual dex- 
terity. ‘‘Laborers” have been considered to include those 
occupations the workers in which require no special training, 
judgment, or manual dexterity, but supply mainly muscular 
strength for the performance of coarse, heavy work. 

Since, in Table I of the occupation report, the semiskilled 
operatives and the laborers in each kind of mines, in quarries, 
in oil and gas wells, and in salt wells and works were reported 
together, a division was necessary for the purposes of this 
grouping. The division is based upon estimates made from 
the detailed figures for each occupation of each of the indus- 
tries involved, as published in Table VI of the occupation 
report. While these estimates are but rough ones, any prob- 
able errors in them could not affect perceptibly the percentage 
reported in Tables III, IV, and V, for either the “Semiskilled 
workers ”’ or the ‘‘ Laborers.” 

For the convenience of any one desiring. to group similarly 
the occupations of any state or of any city, each occupation in 
Table I is preceded by its line number as published for states 
in Table II of the occupation report, and for cities of 100,000 
population and over in Table III of the occupation report. 
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TABLE I. 


GAINFUL WORKERS OF UNITED STATES, 10 YEARS OF AGE AND OVER, CLASSIFIED 
FOR EACH SEX AND FOR EACH GENERAL DIVISION OF OCCUPATIONS BY SOCIAL- 
ECONOMIC GROUPS OR STRATA, 1910. 








Occupations. Total. Male. Female. 





Acricutture, Forestry, anpD AnmmaL Huspanprv. .. . 12,659,203 10,851,702 1,807,501 
6,148 387 5,874,861 273,526 


2,145 2,020 125 
61,816 59,240 
5,865,003 5,607,297 

a 4,332 4,332 
. Gardeners, florists, fruit apes and nurserymen 139,255 131,421 
. Owners and —— of log and timber camps. . .. 7,931 7,927 
. Poultry raisers and poultry yard laborers(a) 15,384 11,777 
. Stock raisers 52,521 50,847 
6,510,816 4,976,841 











. Farm, dairy farm, garden, orchard, etc., foremen . . 


. Fishermen and oystermen 
Garden, greenhouse, orchard, and nursery laborers 
. Lumbermen, raftsmen, and woodchoppers 
. Stock herders, drovers and feeders 
. Other and not specified pursuits 


Extraction oF MINERALS 








. Foremen, overseers, and inspectors 
. Coal mine operatives (48 per cent. of) 
Copper mine operatives (54 per cent. of) . 

Gold and silver mine operatives ~« cent. of) .. 
. Iron mine operatives (51 per cent. of). 
. Lead and zine mine operatives (56 per cent. of) .. . 
. All other mine operatives (51 per cent. of). . . 
. Quarry operatives (22 per cent. of) 
. Oil and gas well operatives (46 per cent. of)...... 
. Salt well and works operatives (20 per cent. of) . . . 


489,543 





. Coal mine operatives (52 per cent. of) 319,240 
. oueper mine operatives (46 per cent. of) 
Gold and silver mine operatives (4 
Iron mine operatives (49 per cent. 
. Lead and zine mine operatives (44 per cent. of) . . . 
. All other mine operatives (49 per cent. of) 
- Quarry operatives (78 per cent. of) 
. Oil and gas well operatives (54 per cent. of) 
. Salt well and works operatives (80 per cent. of) . . . 














(a) Includes 3,233 poultry yard laborers. 
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TABLE 1.—(Continued). 








Occupations. Total. Male. 





Manvracrurine AND MecuanicaL INDUSTRIES 10,658,881 8,837,901 
I. Proprietors, officials and managers 535,223 528,213 
72. Builders and building contractors 174,422 173,573 

166. —— and superintendents manufacturing 104,210 102,748 

168. Manufacturers 235,107 230,809 

169. Officials 21,484 21,083 


III. Skilled workers 3,821,327 3,750,936 











. Blacksmiths, forgemen, and hammermen......... 

. Boilermakers 

. Brick and stone masons 

. Butchers and dressers (slaughterhouse) 

. Cabinetmakers 

. Carpenters 

. Compositors, linotypers, and typesetters 

. Coopers 

. Electricians and electrical engineers 

. Electrotypers, stereotypers, and lithographers 
Engineers (mechanical) 


’ — (stationary) 


‘ Gi Ease 
. Heaters (metal) 
. Jewelers, watchmakers, goldsmiths and silversmiths 
. Ladlers and pourers (metal) 
Loomfixers 


. Machinists, millwrights, and toolmakers 
. Mechanics (n. o. 8.) (a 
. Millers (grain, flour, feed, etc.).................- 
. Moulders, founders, and casters (metal). . me 
" unto, glaziers, varnishes, enamelers, etc... 
. Paper 
, Pattern and model makers 
Plasterers 


4 —— (prin 
. Rollers one roll 2: (metal) 
a — and slaters 


288. —-. and cobblers (not in factory) 
Skilled oceupations (n. o. 8.) (a) 


. Structural iron workers (building) 
. Tailors and tailoresses 163, 705 


iths and coppersmiths 59, 
18,928 


IV. Semiskilled workers 3,681,642 2,026,438 


. Apprentices 118,964 
. Dressmakers and seamstresses (not in factory) . 449,342 
. Dyers 14,050 
. Filers, grinders, buffers and polishers (metal) 49,525 
. Foremen and overseers (manufacturing) 175,098 
. Milliners and millinery dealers 127,906 
. Oilers of machinery 14,013 
193-286. Semiskilled operatives (n. o. s.) (a) 2,441,535 
2°7. Sewers and sewing machine operatives (factory). . 291,209 

















(a) Not otherwise specified. 
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TABLE I.—(Continued). 








Occupations. 





MANUFACTURING AND Mecuanicat Inpusraiss—Continued. 
2,532,314 





91. Firemen (except locomotive and fire department). . 111,248 
94. Furnacemen and smeltermen 19,719 
103-161. Laborers (n. o. 8.) 2,401,347 


TRANSPORTATION 637, 2,531,075 


I. Proprietors, officials, and managers 180,729 

303. Captains, masters, mates, and pilots 24,242 

‘ a ape’ —, 65,604 

5,256 

34,612 

q iG ae 
‘ one transfer companies .. . 15, 

. Proprietors, officials, and managers (n. 0. s.)(a).. .. 4,839 13,411 





II. Clerks and kindred workers 


. Agents (express companies) 
. Baggage men and freight 

. Express messengers and railwa: 
. Mail carriers 

. Telegraph messengers 

. Telegraph operators 

. Telephone operators 

. Ticket and station agents 


III. Skilled workers 








351. Inspectors “steam railroad” 
325. Locomotive engineers 229 96 
326. Locomotive firemen 76,381 


IV. Semiskilled workers ; 570,689 


302. Boatmen, canal men, and lock keepers 
317. Boiler washers and — hostlers 
318. B 

306. 

307. 

320. ons oa street railroad) 

321. Foremen and overseers (railroads) 
345. Foremen and overseers (n. o. 8.) (a) 
309. Foremen of livery and transfer companies. . 
353. Inspectors, “ other transportat jon’ 

352. Inspectors, “street railroad” 

327. Motormen 

305. Sailors and deck hands 

331. Switchmen, flagmen, and yardmen . . 
361. Other occupations ( illed) 








308. Drayme: 

311. 

322. 

354. Laborers ~ 0. 8.) (a) 

304. Longshoremen and stevedores 
341. Telegraph and telephone linemen 














(a) Not otherwise specified. 
’ 
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TABLE I.—(Continued). 








Occupations. Total. Male. Female. 





3,614,670 3,146,582 468,088 





I. Proprietors, officials, and managers 1,404,478 1,331,868 72,610 


. Bankers, brokers, and money lenders 103,170 2,634 
. Officials of insurance companies. 9,376 125 
. Proprietors, officials, and managers (a. 0.8. )(a).. 21,352 1,010 
. Retail! dealers 1,127,926 67,103 
. Undertakers 20,734 19,921 813 
7. Wholesale dealers, importers, and exporters ; 50,123 925 


II. Clerks and kindred workers 1,717,650 1,335,472 382,178 


373. Clerks in stores 387,183 275,589 111,594 
374. Commercial travelers 163,620 j 2,593 
380. Floorwalkers and foremen i in stores. a eee 17,946 14,900 3,046 
382. Inspectors, gaugers, and samplers 13,446 
384. Insurance agents 88,463 
399. Real estate agents and officials 
401. Salesmen and saleswomen 


IV. Semiskilled workers 


375. Decorators, drapers, and window dressers 
376. Deliverymen 

381. Foremen, warehouses, stock yards, etc 
393. yy 

408. Other pursuits (semiskilled) 41,640 34,068 


V. Laborers 183,456 178,619 














386. Laborers in coal and lumber yards, warehouses, etc. 81,123 80,450 
392. Laborers, porters, and helpers in stores 102,333 98,169 


Pvustic Service (not elsewhere classified) 459,291 445,733 
VIII. Public officials 128,779 116,276 


418. Marshals, sheriffs, and detectives. 23,599 23,219 
423. Officials and inspectors (city and county) 52,254 49,668 
426. Officials and inspectors (state and United States). . 52,926 43,389 


VIII. Semiofficial public employees (not elsewhere classified) . . . 263 ,278 262,952 


413. Firemen, fire department 35,606 35,606 
414. Guards, watchmen, and doorkeepers 78,271 78,168 
i 2,158 2,158 


433. Lighthouse keepers 1,593 } 
429. i 61,980 61,980 


430. Soldiers, sailors, and marines 77,153 77,153 
434. Other occupations 6,517 6,335 


V.. Laborers 67,234 66,505 














415. Laborers (public service) 67,234 66,505 
PROFESSIONAL SERVICE 1,663,569 929,684 733,885 
IX. Professional persons ; 1,644,968 919,369 725,599 








(All of “ Professional service,’ Bm 473—" Attendants, 
and helpers professional service. 


IV. Semiskilled workers 
473. Attendants and helpers (professional service) 

















(a) Not otherwise specified. 
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TABLE I.—(Concluded). 








Occupations. Total. Male. Female. 





Domestic AND PersonaL Service 3,772,174 1,241,328 | 2,530,846 
I. Proprietors, officials, and managers ‘i 393,807 223,361 170,446 


477. Billiard room, dance hall, skating rink, etc., mage 16,761 15,943 818 
480. Boarding and ing house keepers 165,452 23,052 142,400 
484. Hotel keepers and managers... .. . J 50,269 14,235 
490. Laundry owners, officials and managers. , 17,057 986 
495. Restaurant, café, and lunch room apes eviews : 50,316 10,516 
496. Saloon keepers BE S19 2 hp LOREAL ae 2 66,724 1,491 


i, I cic ca eedccccccacteneas : 265,965 393,175 


475. Barbers, hairdressers, and manicurists. . . . , 195,275 72,977 22,298 
485. Housekeepers and stewards 189.273 
489. Laundry operatives 111,879 
491. Midwives and nurses (not trained)... . 133,043 
504. Other pursuits ; 29,670 


VI. Servants inven 2,719,227 : 1,967,225 


. Bartenders 2 , 250 
481. Bootblacks. . psiseienletecs J 20 
. Charwomen and cleaners. ’ ‘ 26,839 
. Elevator tenders ,035 25 
. Janitors and sextons. , F 21,452 
. Laborers (domestic and professional service) ‘ Y . 3,215 
. Launderers and laundresses (not in laundries) . . 520,004 


: Porters (except in stores) J ; 73 
1,309,549 


’ Waiters el Hi ‘188, 102,495 85,798 
Cumrican OccUPATIONS................. ‘ ’ 1,737,053 1,143,829 593,224 

















II. Clerks and kindred workers - loa 1,737,053 1,143,829 593,224 
511-522. All classed under “Clerical occupations.” 























Table II, in which are brought together the results of Table 
I, shows the number of workers in each of nine social-economic 
groups, and, for each group, a distribution of the workers by 
general division of occupations. 

Table III shows for the total workers and for the workers of 
each sex the number and the proportion of persons in each of 
nine social-economic groups. According to this table, 22.8 per 
cent. of the gainful workers of the United States, in 1910, 
were proprietors, officials, or managers. Almost one worker in 
every ten (9.9 per cent.) was a clerk or kindred worker, and 
slightly more than one in every ten (10.5 per cent.) was a 
skilled worker. Semiskilled workers formed 14.9 per cent., 
laborers 29.4 per cent., and servants 7.1 per cent. of the workers. 
Public officials, semiofficial public employees, and professional 
persons together constituted only 5.3 per cent. of the workers. 





652 American Statistical Association. [88 


TABLE II. 


GAINFUL WORKERS OF THE UNITED STATES, 10 YEARS OF AGE AND OVER, CLASSIFIED 
BY SOCIAL-ECONOMIC GROUPS OR STRATA, 1910. 








Groups. Total. Male. Female. 





Aut Garnru, Workers 38,167,336 | 30,091,564 | 8,075,772 





I. Proprietors, officials, and managers 8,689,724 8,164,159 525,565 


Agriculture, forestry and animal] husbandry 6,148,387 5,874,861 273,526 
Extraction of minerals 25, 25,127 107 
Manufacturing and mechanical industries 535,223 i 7,010 
Transportation aoe 

610 


‘393, ‘807 223,361 170,446 
3,781,446 2,707,187 | 1,074,259 


326,743 227,886 98,857 
1,717,650 1,335,472 382,178 
1,737,053 1,143,829 593,224 


. Skilled workers 4,021,598 3,951,071 70,527 


Manufacturing and mechanical industries 3,821,327 3,750,936 70,391 
Transportation 200,271 200,135 136 


). Semiskilled workers 5,691,102 3,623,688 
Extraction of minerals 047 449,658 
Manufacturing and mechanical] industries 3,681,642 
Transportation 572,586 
Tr: 309,086 

Professional service 18,601 

Domestic and personal service 659,140 265, 393, 175 


11,227,214 J 1,632,354 








. Laborers 
Agriculture, forestry, and anima] husbandry 6,510,816 , 1,533,975 
Extraction of minerals 489,543 t 598 
Manufacturing and mechanical industries 2,620, ‘689 88,375 


Transportation 1,355,476 3,840 
183,456 4 


67,234 


2,719,227 ’ 1,967,225 
2,719,227 752,002 | 1,967,225 


128,779 116,276 12,503 
128,779 116,276 12,503 


263,278 262,952 326 
263,278 262,952 326 


1,644,968 919,369 725,599 
1,644,968 919,369 725,599 

















In this grouping, the distinction between the sexes is quite 
marked. While 27.1 per cent. of the male workers are in the 
proprietary, official, and managerial group, only 6.5 per cent. 
of the female workers are in this group; and while 13.1 per 
cent. of the males are skilled workers, only 0.9 per cent. of the 
females are skilled workers. Only 2.5 per cent. of the male 
workers are servants, as compared with 24.4 per cent. of the 
females; and only 3.1 per cent. of the male workers are profes- 
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sional persons, as compared with 9 per cent. of the females. 
Semiskilled workers, laborers, and servants together constitute 
over seven out of ten (70.2 per cent.) of the female workers. 

In Table I of the Thirteenth Census occupation report, from 
the occupations of which the groups here presented were 
formed, certain specific occupations which, technically, are 
skilled occupations were classified as semiskilled because the 
enumerators returned so many children, young persons, and 
women as pursuing these occupations as to render the occupa- 
tions semiskilled, even though each of them did contain some 
skilled workers. For this reason, it is believed that the group 
of skilled workers as here presented is somewhat too small. 


TABLE III. 


NUMBER AND PROPORTION OF THE TOTAL WORKERS AND OF THE WORKERS OF 
EACH SEX ENGAGED IN EACH OF NINE SOCIAL-ECONOMIC GROUPS, UNITED 


STATES, 1910. 








Total. ‘ Female. 





Number. Number. 


ey 
Ey 





-_ 
oO 
— 
s 


8,075,772 


s 


WrIm momo | oS 


38,167,336 


I, Proprie officials and managers} 8,689,724 
. Clerks and kindred workers ao 


5,691,102 
11,227,214 
VI. Se 2,719,227 
. Public officials 128,779 
. Semiofficial public employees. . . . 263,278 
IX. Professional persons 1,644,968 





oo 
woonenmaos|S 
~OeRASonom |! « 

= 
Oa 


ESOS. 
SC Pew oocwa! o 


moorSesod 


- 
oto 


























(a) Less than one tenth of 1 per cent. 


Table IV shows for the total Negro workers of the United 
States, and for the Negro workers of each sex, the number and 
proportion in each of nine social-economic groups in 1910. 
For purposes of comparison, the data for the total workers, 
given in Table III, and the data for the Negro workers, given 
in Table IV, are reproduced in Table V, with additional data 
for “White and all other” workers. Table V thus shows, for 
both sexes and for each sex separately, the number and propor- 
tion of the total workers, of the Negro workers, and of the 
white and all other workers engaged in each of nine social- 
economic groups in 1910. Since only 200,475 of the workers 
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in the “‘ White and all other” group are nonwhite, this group, 
for comparative purposes, may be considered practically white. 
Therefore, in the discussion of Table V, the workers in this 
group will be referred to as white. 


TABLE IV. 


NUMBER AND PROPORTION OF THE TOTAL NEGRO WORKERS AND OF THE NEGRO- 
WORKERS OF EACH SEX ENGAGED IN EACH OF NINE SOCIAL-ECONOMIC 
GROUPS, UNITED STATES, 1910. 








Total. Male. Female. 





. . Per : Per 
Number. Number. Cent. Number. Cent. 





z 


Sw” eSewnw-10!] 0 


ALL Groups ‘ 5,192,535 3,178,554 {100.0 | 2,013,981 





_ 


837,872 | 2 95,666 
30,386 6,110 
111,852 | 3. 856 
172,965 | 101,620 
1,746,227 | 54.¢ 985,934 
233,181 : 
831 


8,435 
36,805 


. Proprietors, officials, and managers 933,538 
. Clerks and kindred workers. .... 
TS SE 
. Semiskilled workers 


Cnwonoew 


—_— Oo 


SR 

III. Public officials re 
. Semiofficial public employees . . . 
. Professional persons ; 


i) 


























(a) Less than one tenth of 1 per cent. 


Coming at once, in Table V, to the percentages for the male 
workers, we note that 26.4 per cent., of the Negroes, as com- 
pared with 27.2 per cent. of the whites, were proprietors, 
officials, and managers. The fact that almost as large a pro- 
portion of the Negroes as of the whites were in this group is ex- 
plained by the further fact that 25.3 per cent. of all Negro, as 
compared with 18.8 per cent. of all white male workers were 
agricultural proprietors. Agricultural proprietors constituted 
96 per cent. of the Negro and but 69.2 per cent. of the white 
male workers in Group I. In 1910, but a small proportion of 
the Negro male workers were engaged in clerical or kindred 
pursuits—only 1 per cent. of them being in this group, as com- 
pared with 9.9 per cent. of the whites. Likewise, the propor- 
tion of the Negro males who were skilled workers, 3.5 per cent., 
was quite small as compared with the proportion for the white 
males, 14.3 per cent.; and the Negro semiskilled male workers 
constituted only 5.4 per cent. of the total, as compared with 
12.8 per cent. for the white semiskilled male workers. Of the 
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Negro male workers, considerably more than one half were 
laborers—54.9 per cent. as compared with 29.2 per cent. for 
the white male workers. Of the Negro male laborers, 59.5 
per cent. were agricultural laborers, etc., as compared with 
37.8 per cent. of the white male laborers. Servants consti- 
tuted 7.3 per cent. of all Negro male workers, and but 1.9 per 
cent. of all white male workers. Laborers and servants com- 
bined formed 62.2 per cent. of the Negro male workers, and 
but one half as large a proportion (31.1 per cent.) of the white 
male workers. The proportion of the male workers who were 
public officials, semiofficial public employees, and professional 
persons, respectively, was very much smaller for the Negroes 
than for the whites. 

Of the. female gainful workers, 4.8 per cent. of the Negroes 
and 7.1 per cent. of the whites were proprietors, officials, or 
managers. Only 0.3 per cent. of the Negro female workers 
were engaged in clerical or kindred pursuits, as compared with 
17.6 per cent. of the white female workers. Semiskilled 
workers constituted “ per cent. of all Negro female workers 
and 32.4 per cent. of all white female workers. Almost one 
half (49 per cent.) of the Negro female workers were laborers, 
and almost two out of every five of them (39.5 per cent.) were 
servants. Laborers and servants together constituted not 
far from nine out of every ten (88.5 per cent.) of the Negro 
female workers, as compared with only three out of every ten 
(30 per cent.) of the white female workers. Only 1.4 per cent. 
of the Negro female workers were engaged in professional 
pursuits, as compared with 11.5 per cent. of the white female 
workers. 

Table VI shows the number and the proportion of the gain- 
ful workers in each specified social-economic group and sub- 
group at each of the censuses, 1910, 1900, 1890, 1880, and 1870. 
The classification of occupations followed at the Thirteenth 
Census shows occupations in so much greater detail than they 
were shown at preceding censuses that it was impossible in 
certain cases to rearrange the occupations of preceding cen- 
suses according to the grouping given in Tables III, IV, and 
V for the Thirteenth Census occupations. In Table VI, it has 
been necessary to combine groups III, IV, and V of Tables 
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III, IV, and V—the skilled workers, the semiskilled workers, 
and the laborers. 


TABLE V. 
NUMBER AND PROPORTION OF THE TOTAL WORKERS OF THE UNITED STATES, OF 
THE NEGRO WORKERS, AND OF THE WHITE AND ALL OTHER WORKERS ENGAGED 
IN EACH OF NINE SOCIAL-ECONOMIC GROUPS, FOR BOTH SEXES AND FOR EACH 
SEX SEPARATELY, 1910. 








White and all 
Other.(a) 





Group and Sex. 


Per 
Number. Cent. 





38,167,336 
8,689,724 
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5,192,535 
933,538 
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(a) Only six tenths of 1 per cent. of this group is nonwhite. 
(b) Less than one tenth of 1 per cent. 
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Ill, IV, and V. Skilled and semiskilled workers and 
(a) The contents of each group of Table VI are shown by Table I. 
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For some purposes it is desirable to have—in addition to the 
skilled workers, the semiskilled workers, and the laborers— 
another group, commonly, though somewhat incorrectly, 
called “industrial wage earners.’”’ This group includes the 
manual workers engaged in the extraction of minerals, manu- 
facturing, transportation, and trade. Since many skilled 
workmen, such as carpenters, painters, etc., who are working 
on their own account and are not receiving wages, are included 
in this group, it is not exactly accurate to call the group “in- 
dustrial wage earners.”” Industrial manual workers is a better 
designation. The group may be formed by combining, in 
Table II, above, all the skilled workers, and the semiskilled 
workers and the laborers in the extraction of minerals, manu- 
facturing and mechanical industries, transportation, and 
trade. This group is brought out in Table VI as one of the 
three sub groups of group “III, IV, and V.” 

For the purposes of the grouping for Table VI, it was neces- 
sary in three different cases to separate occupations combined 
at some of the earlier censuses. In making these separations, 
each occupation was allotted that portion of the total for the 
combination, which it constituted at the first census it was 
shown separately. ‘‘Laborers (not specified),’’ classified 
under domestic and personal service prior to the Thirteenth 
Census, were distributed for 1900, 1890, 1880, and 1870, re- 
spectively, by allotting to each of four groups of Table VI,— 
agricultural laborers, etc., industrial manual workers in man- 
ufaeturing, industrial manual workers in trade and transpor- 
tation, and semiskilled workers and laborers in service—that 
portion of the total which a careful analysis of the occupa- 
tional designations classified in 1900 under “Laborers (not 
specified) ”’ indicated the group contained in 1910. Also, be- 
cause of the differences in classification, some estimating was 
necessary in the case of each group of Table VI in order to 
avoid having two slightly different numbers and per cents. for 
each group for the census year 1910, one number and one per 
cent. in Tables III and V and another and slightly different 
number and per cent. in Table VI. But, since in the case of 
no group did the number involved in the estimate equal as 
many as 1 per cent. of the total workers for the census year, it 





95] Social-Economic Groups of the United States. 659 


isnot probable that errors in these estimates affected materially 
the percentage reported for any group. And, since, for ex- 
ample, an error of 381,673 would be necessary in order to 
affect as much as 0.1 per cent. the percentage reported for any 
group for the year 1910, it is not probable that any percentage 
reported has been affected materially by the different estimates 
made. 

According to Table VI, the proportion of the gainful workers 
of the United States engaged in proprietary, official and mana- 
gerial positions (Group I) decreased from 29.2 per cent. in 
1870 to 22.8 per cent. in 1910. This decrease was confined 
entirely to agricultural proprietors who constituted 24.2 per 
cent. of all gainful workers in 1870 and only 16.1 per cent. in 
1910. The proportion which the workers in each of the other 
subgroups of Group I constituted of all gainful workers was 
considerably larger in 1910 than in 1870. 

Clerical and kindred workers increased rapidly from 2.9 per 
cent. of all workers in 1870 to 9.9 per cent. in 1910. No other 
group made so great a change during this period in the propor- 
tion it constituted of the total gainful workers. The numerical 
increase in the workers engaged in each clerical or kindred 
pursuit was also quite general. 

It is interesting to note that between 1870 and 1910 there 
was no marked change in the proportion which the large 
group, the skilled arid semiskilled workers and the laborers, 
combined (Group III, IV, and V), constituted of all gainful 
workers. During the four decades this proportion did not 
vary greatly from 55 per cent. In certain of the sub groups of 
Group III, IV, and V, however, the proportion which the 
workers constituted of the total changed considerably between 
1870 and 1910. “Agricultural laborers, etc.’’ decreased dur- 
ing this period from 23.7 per cent. to 17.1 per cent. of the 
total. This decrease was in line with, though not so rapid as, 
the decrease in the proportion which agricultural proprietors 
constituted of the total workers. The increase between 1900 
and 1910 in the proportion which “ Agricultural laborers, 
etc.”’ constituted of all workers is believed to be due to the 
enumeration as agricultural laborers, in 1910, of women and 
children such as would not have been so enumerated in 
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1890.* The “Industrial manual workers” (Group III, IV, and 
V, sub group 2) are the workers in whose interest most of our 
labor and much of our social legislation has been and is being 
enacted. This is a group consisting mainly of industrial wage 
earners. It is, for many reasons, a group of peculiar and 
special interest. It increased from 31.3 per cent. of all 
workers in 1870 to 35.9 per cent. in 1910. There was, also, 
during this period, an increase in the proportion which the 
workers in each sub group of the industrial manual workers 
constituted of all gainful workers. The industrial manual 
workers may be classified according to skill for the census of 
1910 as follows: 


INDUSTRIAL MANUAL WORKERS, 1910 








Per Cent. of 





Sex and Group. . Industrial Total 
Manual Gainful 
Workers. Workers. 





13,684,123 


4,021,598 
5,013,361 
4,649,164 





11,849,993 


3,951,071 
3,347,408 
4,551,514 





1,834,130 
70,527 
665,953 
97,650 




















While the number of persons in the servant group con- 
siderably more than doubled during the 40 years from 1870 
to 1910, the proportion which servants of all kinds constituted 
of the total gainful workers declined from 8.4 per cent. in 1870 
to 7.1 per cent. in 1910. This group, as shown by Group VI 
of Table I, above, includes all servant pursuits. Servants 


* See Thirteenth Census report on occupations, pp. 2€=29. 
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and waiters, more strictly defined, decreased, relatively, even 
more rapidly—from 7.5 per cent. (941,392) of all gainful 
workers, in 1870, to 4.9 per cent. (1,867,448) in 1910. 

The proportion which public officials (Group VII) consti- 
tuted of all gainful workers declined slightly between 1870 
and 1910; but the semiofficial public employees* increased 
from 0.3 per cent. of all gainful workers in 1870 to 0.7 per cent. 
in 1910. Professional persons increased rapidly in relative 
numerical importance from 2.6 per cent. of all gainful workers 
in 1870 to 4.3 per cent. in 1910. 


* Firemen, fire department; guards, watchmen, and doorkeepers; life savers; lighthouse keepers ; 
policemen; soldiers, sailors, and marines; and “‘other occupations.” 
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THE HORIZONTAL ZERO IN FREQUENCY 
DIAGRAMS. 


By Ear.e Criark, Russell Sage Foundation. 


It is a generally accepted rule of graphic presentation that 
a zero, used in a diagram as a point of reference, should be 
included in the diagram. This rule, while it is observed in 
most statistical work, is almost universally disregarded in the 
drafting of frequency diagrams. 

Diagram 1, presented herewith, is a frequency graph of a 
common type, based on the weights of 738 men.* Weights 
are indicated on the base line, and the per cent. of cases cor- 


Dracram 1.—Weights of 738 men, shown without horizontal zero. 
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responding to any given weight is proportionate to the verti- 
cal distance from the base line to the curve. A zero line is 
the most conspicuous feature of this diagram, but inspection 
of the figure shows that the presentation implies two zeros, 
and that only one of these is shown. The vertical scale, rep- 
resenting percentages, begins at the zero base line, but the 
horizontal scale, representing weights, begins at 90 pounds. 
It is the purpose of this paper to state reasons for including 
the horizontal zero, to direct attention to a type of frequency 


*The data are for 738 men born in Wales, as shown in Yule’s “Introduction to the Theory of Statistics,” 
p. 95. For convenience in presentation, the extremes of the distribution have been arbitrarily shortened. 
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diagrams to which these reasons do not apply, and to illustrate 
methods of drafting. 

A frequency diagram is plotted for the purpose of showing 
the significant facts about a series of variables. The graphic 
form is used rather than a frequency table or text statement 
because most people, even most statisticians, find it easier to 
perceive and appreciate these significant facts by looking at a 
diagram than by studying a column of figures. The essential 
facts about a variable series are: (1) the mean, median, or 
other measure of central tendency, and (2) the distribution of 
the values about this central tendency. These facts are inter- 
dependent. It is a simple matter to compute medians or 
means, but these measures do not reveal the whole truth about 
a distribution; they may be seriously misleading unless shown 
in relation to the distribution of the individual values. 

On the other hand, the distribution is not in itself signif- 
icant unless related to the central tendency. Stated in pounds 
and ounces, the average deviation* of the weights of a group 
of 1,000 élephants would doubtless be far greater than the 
average deviation of the weights of 1,000 canary birds, but this 
would not necessarily mean that the weights of elephants are 
relatively more variable than the weights of canary birds. In 
order to determine the true variability of a series it is neces- 
sary to relate the measure of dispersion to the measure of 
central tendency. This may be done by computing a coeffi- 
cient of dispersiont—a ratio which expresses the dispersion 
as a proportion of the measure of central tendency. 

It follows that, if a frequency diagram is to serve the purpose 
for which it is intended, it must show, with all possible clear- 
ness and effectiveness, the distribution of the individual values, 
the central tendency, and the relation of the distribution to 
the central tendency. Diagram 1 shows the distribution of 
the measures. Does it also show, ith the emphasis required, 
the two other essential facts? 

On Diagram 1 the median is indicated in the usual way— 

*The average deviation is a statistical expression indicating the dispersion of the values of a series of 
variables about their central tendency. It is obtained by adding together the differences bet ween all the 
individual values and the central tendency, and dividing the result by the number of values. 


tThe coefficient of dispersion is the measure of dispersion (the average deviation, standard deviation, or 
probable error) divided by the measure of central tendency. 
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by a vertical line dividing into two equal parts the surface of 
the figure enclosed by the curve and the base line. . This line 
is sometimes referred to as the median line, but the designa- 
tion does violence to the principles of graphic presentation. 
In diagrams, lines or areas are, or should be, proportionate to 
the quantities they represent. The length of the so-called 
“median line” is not proportionate to the median weight of 
men; it is proportionate rather, as the class interval for the 
distribution is 20 pounds, to the approximate number-of men 
whose weights fall within limits fixed, respectively, at 10 pounds 
below and at 10 pounds above the median weight. The line 
represents, in other words, not the median value for the series, 


DracraM 2.—Weights of 738 men, shown with horizontal zero. 
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but a number of cases. There is nowhere on the diagram a 
line representing by its length, or a surface representing by 
its area, the median weights of the men. 

The median can be determined, it is true, by referring to 
the scale at the foot of the figure. As the point of intersec- 
tion of the so-called “‘median line” with the base line falls 
at 156 pounds, as indicated by the horizontal scale, it follows 
that this value is the median, but the result is not obtained 
by the graphic method. The figures on the scale are not 
graphic representations any more than are the figures of a 
table or a text statement. 

The median can, however, be shown by the graphic method 
by so extending the base line that the horizontal scale will 
include the zero. This method has been followed in pre- 
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paring Diagram 2. In Diagram 2 the horizontal distance 
from the vertical line at the left of the figure to the so-called 
“median line,” measured on the base line or along any 
abscissa, represents the median weight of the men. 

If the inclusion of the horizontal zero is required for a com- 
plete graphical representation of the median, it is even more 
essential as a means of showing the relationship of the dis- 
persion to the median. As Diagram 1 contains no graphical 
representation of the central tendency, it follows that it affords 
no graphical representation of the relation between the central 
tendency and the dispersion. The dispersion of the series is 
indicated by the form of the curve and also by a line beneath 
the base line, proportionate in length to the average deviation 
(14.2 pounds), drawn to scale and extending to the left of the 
median. By including this line, the dispersion is reduced to 
a single graphical expression, but the diagram contains no 
graphical representation of the median with which either the 
line or the curve can be compared. 

An effective graphical representation of the relationship 
between the central tendency and the distribution is found in 
Diagram 2, in which the median, represented by the distance 
between the horizontal zero and the vertical “median line,”’ 
can be compared both with the surface of frequency, as indi- 
cated by the curve, and with the line representing the average 
deviation. The ratio of the length of this line to the distance 
from the horizontal zero to the median line is equivalent to 
the coefficient of dispersion. 

The difficulties arising from the omission of the horizontal 
zero are further illustrated in Diagram 3, in which the weights 
of the 738 men are compared with the weights of 279 thirteen- 
and fourteen-year-old school boys.* 

In Diagram 3 the scales for pounds are identical in both 
figures. The appearance of the diagram suggests that the 
two distributions are very much alike; as the figure for men 
has a greater spread at the base line than that for boys it would 
seem that the former represents, if anything, the wider dis- 
persion. This impression is not borne out by the data. The 


*The data, which are for boys attending the Worcester, Mass., public schools, are from a report by Frans 
Boas and Clark Wissler, published in the report of the U. S. Commissioner of Education for 1904. 
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actual dispersion (average deviation) is, roughly, the same for 
the two series: 14.2 pounds for the men and 14.3 pounds for 
the boys. But as the median for the men is 156.3 pounds, 
and that for the boys 90.8 pounds, computation shows that 
the significant measure of relative variability, the coefficient 


Dracram 3.—Weights of 738 men and 279 boys, shown without horizon- 
tal zeros. 
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of dispersion, is .157 for the boys and only .091 for the men. 
In other words, the dispersion of the weights of the boys is 
15.7 per cent. of the median weight of boys, while for the men 
the dispersion of the weights is but 9.1 per cent. of the median 
weight of {men. The apparent similarity of the two distri- 
butions represented in Diagram 3 is, therefore, accidental and 
the“diagram is misleading. 
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It may be said that anyone using Diagram 3 could deter- 
mine the relative dispersions by a study of the figures of the 
scales; that the scales show the medians, and that it is not 
impossible to relate these medians to the dispersions. This 
is true, but, as the same facts can be determined from a fre- 


D1aGraM 4.—Weights of 738 men and 279 boys, shown with horizonta! 
zeros. 
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quency table, the argument offered is merely an argument for 
not using graphical representations for comparing two or 
more series of variables. 

Diagram 4 shows in graphic terms the true relationship 
between the dispersions. The base lines of Figures A and B 
of this diagram have been carried out to zero, and the scales 





668 American Statistical Association. [104 


have been so adjusted that the distance from zero to the median 
is the same in both figures. It is now possible to view the dis- 
persions in their relationship to the central tendencies. The 
lines representing the average deviations, as well as the con- 
tours of the curves, show very clearly that the weights of 
boys are much more widely dispersed than the weights of men. 

The fact that in Diagram 4 the surface enclosed by the 
curve and base line of Figure B is much greater than that en- 
closed by the curve and base line of Figure A, might lead an 
incautious observer to assume that the dissimilarity in the 
appearance of the figures is due to a difference in the number 
of observations—that the number of boys exceeds the num- 
ber of men. Such an inference would be unwarranted. As 
numbers have been reduced to percentages, 100 per cent. is 
the total for each group. The values are plotted upon the 
ordinates; hence, the spaces between the ordinates, and the 
areas enclosed by the curves and the base lines, are without 
significance. It is believed that the diagram affords a correct 
interpretation of the data; that it gives an impression of two 
groups of which one is somewhat closely clustered about its 
central tendency, while the other is much more widely dis- 
persed. 

It should be noted that there is an important group of 
frequency diagrams to which the arguments in favor of in- 
cluding the horizontal zero, which have been stated in the 
preceding pages, do not apply. These are diagrams of dis- 
tributions in which the zero cannot be exactly located. In the 
so-called normal frequency distribution the base line and the 
ends of the curve are in asymptote—the ends and the base 
line are tangent at infinity. It follows that, in plotting proba- 
bilities, or results in the psychological field which are based 
not upon concrete measurements but upon rankings, the hori- 
zontal zero can not be shown. 

But it is also impossible to show a zero based upon data of 
this kind in any type of diagram, and this is true whether the 
zero is vertical or horizontal. If the horizontal zero can not 
be shown in a frequency diagram representing the distribution 
of school boys with reference to a given mental trait, as deter- 
mined by the rankings of competent judges, neither can a 
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zero be shown in a diagram in which the ability of any one of 
these boys at successive tests is indicated by a historical curve. 
It is possible to present a horizontal zero in a frequency dia- 
gram for any data for which a vertical zero for an ogive curve 
can be shown. 

A practical objection to the inclusion of the horizontal zero 
is the fact that additional space is required. But this objec- 
tion is no more applicable to the horizontal zero in frequency 
diagrams than to the vertical zero in line diagrams. The 
inclusion of the vertical zero in diagrams of the latter type is 
the established practice. And an inspection of the diagrams 
presented with this paper makes it clear that the inclusion of 
the horizontal zero presents no serious difficulties. A case 
will occasionally be encountered in which the dispersion con- 
stitutes so small a proportion of the central tendency that the 
zero, whether horizontal or vertical, must be omitted, but such 
cases are most exceptional. 

The arguments and the illustrations presented in the pre- 
ceding pages seem to support the following conclusions: In 
frequency diagrams, where the position of the horizontal zero 
is exactly ascertainable, and where the dispersion is not too 
small in proportion to the measure of central tendency, the 
horizontal zero should be included in the diagram. This 
means that the horizontal zero should be included in a fre- 
quency diagram in all cases in which a zero for similar data 
would be included in any type of diagram. Without the 
horizontal zero the frequency diagram does not afford a com- 
plete graphical representation of the central tendency nor of 
the relationship of the central tendency to the distribution. 
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THE COEFFICIENT OF CORRELATION.* 


By Witi1am GARDNER REED, U.S. Weather Bureau. 


In many studies it is necessary or at least desirable to test 
the existence of concomitant variation between two series of 
variable quantities. A comparison of the plotted variables 
furnishes a rough, but for some purposes adequate, means of 
examining the relationship. Figure 1 is an example of this 


RAIN-| RELATION BETWEEN THE JULY RAINFALL AND THE YIELD OF CORN, 1888-1915 


The solid line (——) indicates the departure of the overeyp rainfall from the normal for 
the month of July.over the following-named States,for the 28 years indicated; Ohio, Indiana 
illinois, lowa, Nebraska, Kansas. Missouri.and Kentucky. | 

The broken line(-=-)shows the departure of the average yield of corn from the normal.in 

bushels per acre forthe same area,and 


US.Weather Bur. National Weatner and Crop Bul, Ser I9I6No.14, June 20, i916 


sort of comparison. However, the use of curves is not to 
recommended for careful work because of the difficulty in 
selecting the proper scales and the dangers resulting from per- 
sonal bias. The usual tabular method is slightly more refined, 
but tables involve too many figures to give an adequate idea 
of the conditions and give no concise measure of the degree of 
relationship. 


* For instructions for obtaining the coefficient of correlation see Persons, W. M.: The Correlation of 
Economic Statistics, Am. Stat. Asso., Quart. Publ., Vol. 12, pp. 287-322, Boston, 1910. 
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The English biometricians have perfected a method of 
stating the degree of relationship, which was invented by 
Bravais about 1845. ‘‘Correlation may be briefly defined as 
the tendency towards concomitant variation and the so-called 
correlation coefficient is simply a measure of such tendency, 
more or less adequate according to the circumstances of the case.’’* 
The early statements of the use of the coefficient of correla- 
tion indicate clearly that the attempt to obtain such a coef- 
ficient from miscellaneous material is an abuse of this method 
of measuring relationship.{| The material in hand should be 
investigated carefully before any attempt is made to deter- 
mine the relationship by the use of the coefficient of correla- 
tion. This investigation may take the form of a correlation 
table or of a “dot chart” after Galton’s graphic method of 
correlation.t 


METHOD OF PROCEDURE. 


If the coefficient of correlation is to have any definite 
meaning, the procedure must be somewhat as follows: 

1. The material (e. g. Table I) should be arranged in groups 
in the form of a correlation table (Table II), or, better, plotted 
as a dot chart (Figure 2). The table or chart should then be 
carefully examined to see whether the points may be general- 
ized to a straight line, that is, whether there is a tendency 
for a high value of one variable to be associated with high 
values of the other variable and proportionately higher or 
lower values of the one to be associated with similar values of 
the other. This shows positive linear correlation. When lower 
values of the one are associated with higher values of the other, 
the correlation is said to be negative. For example, the dots 
in Figure 2 may be generalized to the line AB as well as to 
any curve. 

*Brown, W.: The Essentials of Mental Measurement, Cambridge, University Press, 1911, p. 42. 
(Italics are the present writer's.) 

t Yule, G. U.: Introduction to the Theory of Statistigs, ed. 2, London, Griffin & Co., 1912, pp. 169, 
177. 


t See Davenport, C. B.: Statistical Methods, ed. 3, New York, Wiley, 1914, pp. 42-47. 
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* TABLE I. 


CORRELATION OF JULY RAINFALL AND THE YIELD OF CORN IN OHIO. 


Washington, 


o., Weather Rev., Vol. 42, 1914, p. 80. 


(Smith, J. W.: The Effect of Weather upon the Yield of Corn in Ohio. 
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The Coefficient of Correlation. 


CORRELATION BETWEEN JULY PRECIPITATION 
AND YIELD OF CORN IN OHIO 
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M’,=4.0 inches M’,=35 bu. 


3.9 26 
=4,0+°~ =4.1 M, =35— — =34.6 
M, + $0 60 


rr=+3.9 Ly = —26 
Lax? = 112.67 Ly? = 1258 
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_ 3.36+.03 
6.44 
0.526 + E, 
674 I=? 
a 
.723 
= +.674 a 
= +.063 
r= +0.526 + .063 








Note: r is not the same here as in the original paper 
because a single average yield of corn has been used for sim- 
plicity. 

EXPLANATION OF SYMBOLS. 


n number of observations (years of record) 
M,, true mean July precipitation 
M’, some arbitrary number near M, 
M, true mean yield of corn 
M’, some arbitrary number near M, 
x departure of each July precipitation from M’, 
y departure of yield of corn in each year from M’, 
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=x Algebraic sum of departures of July precipitation 
Sy Algebraic sum of departures of yield of corn 
2? algebraic sum of squares of departures of July precipi- 
tation 
Ly? algebraic sum of squares of departures of yield of corn 
zy algebraic sum of products of departures (x and y) 
o, standard deviation of July precipitation 


/ ta? (22\? 
Ox = —_— — 

7 (s) 
o, standard deviation of yield of corn 


_ (74) 


n n 


r coefficient of correlation 


= (2)@) 





O70, 
E, probable error of the coefficient of correlation 


1—r? 
E, = £.674 


TABLE II. 
CORRELATION TABLE SHOWING THE RELATION BETWEEN JULY PRECIPITATION 
AND THE YIELD OF CORN IN O8IO. 
(From Smith, J. W.: The Effect of Weather on the Yield of Corn, Washington, Mo., Weather Rev., 
Vol. 42, 1914, pp. 78-93.) 
Yreip or Corn rx Busnes per Acre. 








July 20.0 
Precipitation to 
in Inches. 24.9 
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2. If it appears from this examination that a straight line 
is as good a fit as any other type of curve not too complicated 
to be useful as a measure of relationship, the data may be 


CORRELATION BETWEEN JULY PRECIPITATION 
AND YIELD OF CORN IN OHIO 
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replotted on a new dot chart for which the unit of measure- 
ment on one axis is the standard deviation of one of the varia- 
bles, and the unit on the other axis is the standard deviation 


of the other variable (see Figure 3). 
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3. The position of the straight line which most nearly 
satisfies the data on the second dot chart may be determined 
rigidly by the method of least squares. When the standard 
deviation of one variable is used as the unit of the ordinates 
and the standard deviation of the other variable as the unit 
of the abscissae, the angles between this straight line of closest 
fit and the axes are significant. If these angles are equal, 
i. e. each 45°, the relationship between the variables is perfect 
(see C—D in Figure 3). If the line coincides with one axis or 
the other no relationship is shown, although the converse is 
not necessarily true.* Positions between these two show 
partial relationship (see A’B’ in Figure 3). 

4. The coefficient of correlation is merely a statement of 
the position of the straight line of closest fit on a chart where 
the units are the standard deviations of the variables as this 
position is determined by the least square adjustment.t The 
coefficient of correlation is expressed as the tangent of the 
angle made by the line of closest fit and the axis to which it is 
more nearly parallel (e. g. angle X’OB’ in Figure 3 is 273°, 
tan X’OB’ =+0.526). In actual practice the coefficient of 
correlation may be determined mathematically from the data 
as shown in Table I without plotting the material on a dot 
chart, like Figure 3. However, the coefficient should never 
be attempted without first investigating the relationship far 
enough to see if it follows a straight line. That is, steps 2 
and 3 may be omitted in practice; step 1 should never be 
omitted. 

5. If the examination of the correlation table or dot chart 
shows that the relation is not that of a simple straight line, 
the coefficient of correlation is not a measure of the relation- 
ship between the variables. 


LIMITATIONS OF THE COEFFICIENT OF CORRELATION. 


It is clear even from a superficial study of the question that 
the coefficient of correlation obtained from material where a 
straight line relationship does not obtain may be too small, 


* Yule, G. U.: Introduction to the Theory of Statistics, ed. 2, London, Griffin & Co., 1912, pp. 174-175. 
t Yule, G. U.: Introduction to the Theory of Statistics, ed. 2, London, Griffin & Co., 1912, p. 172. 
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but will never be too large.* 
A coefficient of correlation may 
be near zero when there is very 
close relationship, as is shown 
in such a condition as the re- 
lationship between the height 
of high water and the phase 
of the moon which is shown 
for Old Point Comfort, Va., 
by Table III and Figure 4. 
The figure indicates that the 
relation is harmonic; although 
there is a close and very defi- 
nite relation between the 
phenomena, the coefficient of 
correlation is near zero 
(—0.106+.088). because the 
different portions of the curve 
of regression are in such rela- 
tions to each other that a 
straight line along an axis will 
most nearly satisfy all the 
points. Of course the angle is 
then zero and its tangent is 
zero. 

*See Yule, G. U.: Introduction to the Theory 
of Statistics, ed. 2, London, Griffin & Co., 1912, 
p. 175, and Brown, W.: The Essentials of Mental 


Measurement, Cambridge, University Press, 1911, 
p. 27-59. 
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M’, =2.6 
M,=30 M, =2.6+ - =2.65 
rzrz=0 Ly=+.05 
Da? = 18910 Dy?=3.24 
o,.= a oy= a 
“ 61 61 
=17.6 = 22 
Lary = — 25.1 
—25.1_ 
61 
~ 17.6 X 22 
All 


3.87 


0 


r 


= —.106+E, 


— 2 
BE = 674 1—(—-106)" 


v61 
1—.0112 
7.8 
= 0.674 X0.13 
r= —0.106+0.088 


When the relation is not linear the concomitant variation 
may be shown by the use of a “correlation ratio,” which is 
simply a further development of the theory of correlation.* 

It is, however, not the purpose of this paper to consider 
relationships shown by curves of a higher order than a straight 
line, as such correlations involve more complicated mathemat- 
ical theory and also require many more observations to be 
significant. 


= .674 


ADEQUACY OF THE COEFFICIENT OF CORRELATION. 


The conclusion seems legitimate that the coefficient of 
correlation may be used strictly as a measure of relationship, 


* See Pearson, K.: Mathematical Contributions to the Theory of Evolution: 14. On the general theory 
of skew correlation and non-linear regression. London, Drapers Company Research Memoirs. Bio- 
metric Series 2, 1905. Brown, W.: The Essentials of Mental Measurement. Cambridge, University 


Press, 1911, pp. 57-59. 
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when such relationship has been determined by other investi- 
gation to follow straight line relations. The use of the coeffi- 
cient of correlation is to be recommended because it is 
independent of the personal equation of the investigator, and 
of the units employed, and because it shows rigidly the correct 
position of the line indicated by the dot chart. 

In using the coefficient of correlation it is desirable to cal- 
culate the probable error (see Tables I and III for method).* 
The probabie error is that divergence from the observed mean 
on either side within which half the observations lie. Its 
size is a measure of how closely the results from an infinite 
number of cases would correspond with those obtained from 
the observed cases. When the coefficient of correlation is not 
greater than its probable error there is no evidence that there 
is any correlation; but when the coefficient of correlation is 
clearly greater than its probable error correlation is indicated; 
and when it is much greater (six times as great is an accepted 
empirical amount) it may be safely assumed that there is con- 
comitant variation.t 

The coefficient of correlation is obtained by applying the 
least square adjustment to all the material and is, therefore, 
the straight line of closest fit. If the relationship is not that 
of a straight line, it is obvious that the straight line of closest 
fit is not a good measure of the relationship and that some 
other measure (e. g., the correlation ratio) must be used. 
Therefore, the coefficient of correlation should never be used 
to show relationship until after the phenomena have been 
investigated, at least far enough to show whether a straight 
line satisfies the relationship as well as any other curve. 


LITERATURE. 


The development of the theory of correlation resulting in 
the adoption and use of the coefficient of correlation is, of 
course, largely mathematical. While the literature on the 
subject is considerable, the greater part of the contributions 
are concerned with the application of the coefficient to par- 

* For a general discussion of the significance of probable error see Yule, G. U. Introduction to the 


Theory of Statistics, ed. 2, London, Griffin & Co., 1912, pp. 310-311. 
t See Bowley, A. L.: Elements of Statistics, ed. 3, New York, Scribner, 1907, p. 320. 
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ticular problems, and hence the development of the theory of 
correlation is incidental and widely scattered. 

“The fundamental theorems of correlation were for the first 
time and almost exhaustively discussed by A. Bravais* 
[more than] half a century ago. He deals completely with the 
correlation of two and three variables. Forty years later Mr. 
J. D. Hamilton Dickson + dealt with a special problem proposed 
tc him by Mr. Galton, and reached on a somewhat narrow 
basis some of Bravais’ results for correlation of two variables. 
Mr. Galton at the same time introduced an improved notation 
which may be summed up in the ‘Galton function’ or coeffi- 
cient of correlation. This indeed appears in Bravais’ work, 
but a single symbol is not used for it. In 1892 Professor Edge- 
worth, also unconscious of Bravais’ memoir, dealt in a paper 
on ‘Correlated Averages’ with correlation for three variables.{ 
He obtained results identical with Bravais, although ex- 
pressed in terms of ‘Galton’s functions.’ ’’§ 

The following publications contain complete statements of 
the later development: 


Pearson, Karu: Contributions to the mathematical theory of evolution; 
London, Royal Society, Philosophical Transactions, Series A, as 
follows: 


. On the dissection of frequency curves, Vol. 185, 1894, pp. 71-110. 

. Skew variations in homogeneous material, Vol. 186, 1895, pp. 343- 
414. 

. Regression, heredity, and panmixia, Vol. 187, 1896, pp. 253-318. 

. On the probable errors of frequency constants and on the influence 
of random selection on variation and correlation, Vol. 191, 1898, 
pp. 229-311. 

. On the reconstruction of the stature of prehistoric races, Vol. 192, 
1898, pp. 169-244. 

. Genetic (reproductive) selection; inheritance of fertility in man and 
of fecundity in thoroughbred race horses, Vol. 192, 1899, pp. 257- 
330. 

. On the correlation of characters not quantitatively measureable, 
Vol. 195, 1900, pp. 1-47. 


* Analyse mathematique sur les probabilities des erreurs de situation d'un point. Paris, Academie des 
Sciences, Memoires presentes pir divers savants. Series 2, Vol. 9, 1846, pp. 255-332. 

t Appendix to Galton, F.: Family Likehess in Stature. London, Royal Society, Proceedings, Vol. 40, 
1886, pp. 63-73. 

¢ London, Philosophical Magazine, Series 5, Vol. 34, 1892, pp. 190-204. 

§ Pearson, Kari: London Royal Society Philosophical Transactions, Series A, Vol. 187, 1896, p. 261. 
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8. On the inheritance of characters not quantitatively measureable, 
Vol. 195, 1900, pp. 75-150. 

9. On the principle of homotyposis and its relation to heredity, to the 
variability of the individual, and to that of the race, Vol. 197, 
1901, pp. 285-379. 

10. Supplement to a memoir on skew variation, Vol. 197, 1901, pp. 443- 
459. 

11. On the influence of natural selection on the variability and correla- 
tion of organs, Vol. 200, 1902, pp. 1-66. 

12. On a generalized theory of alternative inheritance with special 
reference to Mendel’s Laws, Vol. 203, 1904, pp. 53-86. 


In London, Drapers’ Company Research Memoirs, Biometric Series. 


13. On the theory of contingency and its relation to association and 
normal correlation. Memoir 1. 

14. On the general theory of skew correlation and non-linear regression. 
Memoir 2. 

15. On the mathematical theory of random migration. Memoir 3, 1906. 

16. On further methods of determining correlation. Memoir 4, 1907. 

17. [Not published.] 

18. On a novel method of regarding the association of two variates 
classed solely in alternate categories. Memoir 7, 1912. 


PEeaRsON, Kar: On the partial correlation ratio. London, Royal Society, 
Proceedings, Series A, Vol. 91, 1915, pp. 492-498. 

Brown, W.: The essentials of mental measurement, Cambridge, Univer- 
sity Press, 1911. 

ELpERTON, W. P.: Frequency curves and correlation. London, Layton 
Brothers, 1906. 

Hooker, R. H.: Correlation of successive observations, Royal Statistical 
Society Journal,. Vol. 68, pp. 676-703. 

Totiey, H. R.: The theory of correlation as applied to farm survey data 
on fattening baby beef, U. S. Department of Agriculture Bul. 504, 
Washington, Govt. Ptg. Office, 1917. 

Watxker, GitBert T.: Correlation in seasonal variation of weather, 
Indian Meteorological Department Memoirs, Simla, 1909-1915. 


1. Correlation in season variation of climate, Vol. 20, part 6, 1909, pp. 
117-124. 

2. (A) On the probable error of a coefficient of correlation with a group 
of factors. 

(B) Some applications of statistical methods to seasonal forecasting, 

Vol. 21, part 2, 1910, pp. 22-45. 

3. On the criterion for the reality of relationships or periodicities, Vol. 
21, part 9, 1914, pp. 13-16. 

4. Sunspots and rainfall, Vol. 21, part 10, 1915, pp. 17-60. 

5. Sunspots and temperature, Vol. 21, part 11, 1915, pp. 61-90. 

6. Sunspots and pressure, Vol. 21, part 12, 1915, pp. 91-118. 
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Yue, G. Upny: Introduction to the theory of statistics, ed. 2, London, 
C. Griffin & Co., 1912, pp. 157-253. 


More elementary discussions are contained in the following 
papers: 


Persons, W. M.: The correlation of economic statistics. Boston, Ameri- 
can Statistical Association, Quarterly Publications, Vol. 12 (1910), 
pp. 287-322. 

Hooker, R. H.: An elementary explanation of correlation: illustrated by 
rainfall and the depth of water in a well; London, Royal Meteorological 
Society Quarterly Journal, Vol. 34, 1908, pp. 277-291. 

EvpertTon, W. P. and E. M.: Primer of statistics, London, A. and C. 
Black, 1910, pp. 55-72. 

Kine, W. I.: Elements of statistical method, New York, Macmillan, 1912, 
pp. 197-215. 

Dives, W. H.: The practical application of statistical methods to meteorol- 
ogy. London, H. M. Meteorological Office, The computer’s handbook 
(M. O. 223), section 5, part 2, 1915, pp. V29-V52. 


The most complete bibliographies will be found in: 


Yue, G. Upny: Introduction to the theory of statistics, London, C. 
Griffin & Co., 1912, pp. 188, 208-209, 225-226, and 252. 

Davenport, C. B.: Statistical methods with special reference to biological 
variation, third, revised edition, New York, J. Wiley & Sons, 1914, 
pp. 62 and 85-104. 
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AN INTERPOLATION FORMULA FOR “EQUIDIS- 
TANT” FREQUENCY DISTRIBUTIONS. 


By Harry LanoMan, Ocean Accident & Guarantee Corporation, 
New York City. 


A frequency distribution is conveniently represented by 
the area under some continuous curve. Moreover, the total 
frequency is usually included between two distinct values of 
the independent variable, outside of which there is no fre- 
quency. Hence, if z denote the independent variable, and 
y=f(x) denote the curve bounding the distribution, we must 
have 
(1) f(a) =f(b) =0, 
where a and b are the extreme values of z. 

The frequency between any two values of z, a and 8, is 

B 
given by | ydzx. Inthe usual case the values of this integral 


are known for each of the sub-intervals obtained by dividing 
the entire internal [a, 5] into equal parts. Let m be the 
number of sub-intervals, and for convenience choose each 
sub-interval of unit length. Then 
(2) b=a+m. 
Let ai, where k=1, 2, 3, -*, m, be the known frequencies over 
each of the sub-intervals [a, a+1], [a+1, a+2], -**, [a+k—1, 
a+k], -**, [a+m—1,a+m]. Then 
a+k 


(3) a= | y dr; k=1, 2,3, “mM, 
a+k—1 
The problem then is to interpolate frequencies for other 
than the equidistant values of the independent variable. 
This consists in finding some function f(z) which will satisfy 
the m+2 conditions of (1) and (3).* Since this is infinitely 


*Cf. the problem first solved by Lagrange, Oeuvres 1, p. 87. Also articles in Enzyk. d. Math. 
Wissen. IIA, 9a, p. 647. 
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indeterminate, the problem reduces to finding a form for 
f(z) which can be handled practically, and which can be applied 
in all cases. 

For this purpose we may choose for y=f(x) the function 


(4) —‘An' sin (t—a@) nr 
m 


By (2), this evidently satisfies the conditions (1). The condi- 
tions (3) then become 

(5) a= = an | cos 229 __ egg Att ; k=1, 2,3, -**, m. 

oie m m 

The m conditions (5) are just sufficient to determine the m 
constants a,. The problem now is to evaluate these coef- 
ficients. 

Suppose the m equations of (5) to be written in succession 
and that we add the first k of them together, for all values 
of k. Then, if we put 
(6) Sk =a, a2+as+ ***+ak, 


=m rn=m 


r=m 
nr nr 
s&= > a,:|1—cos =e | = Da,— > dy* Cos —-k. 


n=1 n=l n=l 


Letting 
rn=m 
(7) = = a,, 


n=l 
the last may be written 
(8) s=A-— E a, cos — k; k=1, 2,3, °°, m. 
Bar m 
The m conditions of (8) replace those of (5). 
Let r be any integer from 1 to m. Then 
(9) rim:nZ=m. 
Now consider the m quantities 


(10) 2 cos kj k=1, 2, 3, °°, 8. 


Multiplying each of the equations of (8) by the corresponding 
quantity of (10) and adding the resulting equations we obtain 


k=m r m 


(11) 22% cos ke AI (r)+ 2 an’ K (n,r), 
k=l m 1 


i= 
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where 

k=m 
(12) { (r)=2 = cos —h 

k=l m 
and 
(13) K (n, r)=— 22 cos k- cos sk. 

kel 
From trigonometry, we have 
‘ 6 
_ , sn (+5) 

(14) Z ccs. p= —5+-—+——. 


pol 2 sin 5 


Putting g=m, and @= = in (14), (12) becomes 


. Tr 
sin rr+ om | 
T 
sin _ 
By (9), the denominator of (15) never vanishes. Hence: 
(16) if r is even, I(r) =0; if r is odd, I(r) = —2. 
From trigonometry, 
2 cos x cos y=cos (x+y) +cos (x—y). 
Applying this, (13) becomes 
[cos etre mr), phen any 
kel m 


= ig , the, sen (s—1)s k 


a 


k=l m k=l 
Again using (14), (17) becomes 


sin | (m+n) a 
mine 





(15) I(r)=-1+ 


(17) K(nan=- 


uaeiLd 





(18) K(n,r)=5- 
2 sin 





ls or 40 “oe 


2m 
(n—r)r 
m 





2 sin 


Now suppose n+r is even; then n—r is even, and the angles 
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(n+r)x and (n—r)x may be dropped from the brackets in 
(18). In that case K=0, except when the denominator of 
either fraction vanishes. This can happen only when oo 


m—r. . m+r.. 
or 5, is integral. By (9), Im 3 integral only when n=r 


—r.. 
=m is integral only when n=r. Hence we have: 


das, 
’ 2m 
(19) if n+r is even, and n-:r, K(mr)=0. 


Now suppose n+r is even and n=r. In this case (18) is 
indeterminate. However, on returning to (17) we obtain 


sin | are + =| 
m 


. rr 
2sin — 
m 





—m= 


(18a) -K(r,r) =5- 


Hence K = —m, unless the denominator is zero. This occurs 
only when r=m. Hence we have: 


(20) af r-|:m, K(r, r) = —m. 
Suppose n=r=m. Substituting in (17) we get at once: 
(21) K(m, m) = —2m. 


Consider n+r odd. Then n—r is odd, and the angles 
(n+r)x and (n—r)m in the brackets of (18) may each be 
replaced by x. It is clear, furthermore, that in this case the 
denominators of (18) cannot vanish. Hence we have 
(22) if n+r is odd, K(n, r) =2. 


For definiteness, we shall now suppose m odd. Then we 
may conveniently represent the results of (16), (19), (20), (21), 
(22), in the following table: 


‘ 
n=r K=-—m 


nm even 
r even K= 0 


n odd K= 2) 
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K = —2m ) 


nodd 4 
r odd 








neven 


We note that for each value of r from 1 to m we get a set of 
multipliers in (10), and a new equation (11) from equations (8). 
We thus obtain m new equations to replace those of (8). 
That the new set is equivalent to (8) is shown by the fact that 
a definite solution is obtained. 

We may put 
(23) M =a,+43+45+ *** +m ; 

N =@2.+ 04+ 06+ **°+4m-1. 
Then 
A=M+4N. 


Consider first the irregular case r=m. From the table we 
obtain: J] = —2; for n even, K=2; for n odd and n-:m, K=0; 
forn=r=m, K=—2m. Hence equation (11) takes the form 


k=m 


2 = se cos rk = —2A —2mamt2(d2+04+G6+ *** +4m-1) 


k=l 
= —2A+2N —2man, 
or 


k=m 
(24) > (—1)*s; = —M — man. 
k=l 
Let r assume any of the even values from 2 to m—1. From 
the table, ]=0; K =2 if n is odd; K=0 if n is even and n-=r; 
K=-—m if n=r. Hence (11) becomes 


k=m 


2 = 8° COs 7k =2(a;+03+ *** +m) — Mr 
k=1 m 


(25) = 2M —ma;; r=2, 4, 6, ***, m—1. 

Let r assume any of the odd values allowed by (9), exclusive 
of r=m, already considered. From the table we then obtain, 
I=—2; K=2if nis even; K=0 if nis odd and n-:r; K=—m 


if n=r. Hence (11) becomes 
9 
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k=m 
(26) 22 scos k= —2A +2(a2+a4+ ***+an—1) — ma, 
k=l 
= —2M —ma;; r=1, 3, 5, ***, m—2. 


We have then m equations in (24), (25), (26). We now pro- 
m— 
9 


— 


ceed to determine M. Adding together the : equations 


of (26), we obtain 


m—1 


“2k=m — 
t=1 kel m 


—(m—1)M—m/(a,+a, 


+a5+ ***+dm-2) 
7 | = —(m—1)M—m(M—an) 
(27) = —(2m—1)M+man. 


Adding (24) and (27), we obtain 
t=! 
k=m 2 kum Pom 
(28) wa -. >(-l1jist+2 > TH: cos | 
2m | pnt i=l k=l m J 

This gives us M in terms of definitely known quantities. 
From (24), (25), (26), the coefficients a, are then determined. 
We can, however, decidedly simplify the expression for M in 
(28). 

From trigonometry, 


ont sin 2 
> cos (2u—1)0= asin 


u=1 
unless sing=0. Using this, (28) becomes in order 


m—1 


1 k=m k=m we k 

=~ Sn z= (—1)'sk+2 5 se = cos (21-1) | 

2m | imi k=l = tol m 
m—1 


t= —— 


1 {tm ' k=m—1 2 
pee = >> 
in = (—1)*se+2 2 se = 


k=1 k=l 


2 
+2sm = cos (2i— De} 


t=1 
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1 kom! vn UT 
=~ ie 2 E(—I)in+2 2 y—3 ~ (mts 
2m | pet Ml ~ 


—(m—1)s_ 


1 k=m k=m—1 
| > (-1)'sk:— (—1)Ase— (m— 1) 
k=l k=l 


=— mi — Sm— MSm+ Sm 
2m j 
_ 5m 
(29) = 5° 
From (24), (25), (26), we now obtain the coefficients a, in 
the following forms: 












a; = — $04 "S's cos Th}, rat, 3, 5°, m—2 
m k=l 


If in (4) aq be replaced by = the three forms (30) for the 
coefficients a, can be expressed in the single form 


(31) 








a, = (-y2-2 = skcos—k. 
M ro1 m 

Suppose now m is even. Then we obtain in manner parallel 

to the case m odd: 

















re. 

2 k=m oa 
(28’) MowW+3 z n cos — UD", 
™ tal kel 







, , _ Sm 
(29’) =F. 


*In practical computation, the two values found for M serve as an invaluable check. 
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The coefficients a, then become 
1{s, == } 
=—{F— 2 (-1)ts}; 
m{2 pat | 
P kam TT 
(30) a= z ssc0s Tk}, r= 2,4, 6, m—2; 





k=m 
on 315+ z= sicos Tk], = 1,8,5,-"5m—I, 
mi2 pet m 


From (30’) it is clear that (31) gives the coefficients a, also 
when m is even. Having expressed the coefficients a, in 
terms of the definitely known quantities s;, the function (4) is 


then completely determined.* 
In practical computation, the labor involved consists pri- 


k=m 
ie ; rr 
marily in computing the values of > sj; cos —& for all values of 
k=1 m 


, rT ies 
r=m. This requires the m? values of scos—k. But it is 
m 


readily observed that all the m? values of cos Tk are giver by 
m 


‘ 


‘con- 


only _ of them (if m is odd). Furthermore, 5; is a 


stant multiplier’ for each of the values of cos oo r=m. In 
m 


practice, further use can be made of the simple relations be- 
tween the trigonometric functions to materially shorten the 
work. The usual case arising for interpolation, is to obtain 
the frequencies for midway divisions of the sub-intervals. 
The computation made necessary for this is about equivalent 
to that in obtaining the coefficients a, from the given fre- 
quencies ag. 

*If the number of sub-intervals of [a, 6], m, be indefinitely increased, it may be shown without 
difficulty, as is to be expected, that the function (4) becomes a Fourier series. 





Reviews and Notes. 


REVIEWS AND NOTES. 


Poverty and Social Progress. By Maurice Parmelee, Pli.D. New York: 
The Macmillan Company, 1916. 

Professor Parmelee has undertaken in this book of nearly five hundred 
pages to ‘‘give a comprehensive survey of the problems of poverty which 
shows the one-sided character of many of the explanations of its causation 
and which will furnish the starting point of an effective program of preven- 
tion.” This promise of the preface has been fulfilled in a more satisfying 
manner than in any American work on this subject. In fact, there is none 
to be compared with it in its broad comprehensiveness. Like most books 
of moderate size that achieve a comprehensiveness, it cannot be intensive 
within its physical limits. It must be, and is, summary in its treatment. 

After a short introduction, the study falls into two natural subdivisions: 
the causes and conditions of poverty; and the remedial and preventive 
measures, both those in use and those suggested. 

Causes and conditions are summarized as follows: Factors biological, 
such as inheritance, the pathology of body and of mind; factors involved 
in the distribution of wealth and income; factors inherent in the present 
organization of labor, e. g., unemployment, sweating, child-labor; the rela- 
tionship of population, the means of subsistence; and political and domestic 
maladjustment. 

In his survey of remedies and prevention, Professor Parmelee discusses 
humanitarianism, philanthropy, and eugenics, subjecting the claims of each 
as a remedy of the evil to a sharp criticism. Thrift, social insurance, and 
pensions are shown to be limited in their effectiveness, both as remedies 
and preventives. True prevention must, of course, be the aim of society 
in its treatment of poverty. Social legislation tends in that direction, 
though with only partial success. A study of the economic organization 
of modern society reveals certain basic facts. These are: the enormous 
concentration of wealth and income of a few; the instability of industry; the 
positive waste of economic goods and the negative waste caused by ineffi- 
cient employment of human energies. Only fundamental, deep-striking 
reforms of these defects of our economic system can be relied upon to help 
us out of this perpetual slough of despond. Social machinery which will 
raise wages, regulate the labor supply, redistribute income from ownership 
of property, eliminate waste due to instability of industry and to lack of 
full utilization of the ability of the worker, will be effective in a large degree. 
The alternatives of highly regulated competition, and of collectivist indus- 
trial democracy afford two possible basic principles upon which to elaborate 
such machinery. Such machinery would require a political democracy 
more efficient than any we now know; and it would imply a development of 
the “world state” or some step in that direction. 
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As for the other half of the title of the book, “social progress,’’ our author 
defines its aim as the codéperative attainment of the well-being of all man- 
kind; its standard of value is frankly and nobly hedonistic. Humar well- 
being is the coming of the “normal life” for all, or the vast majority, of 
mankind; the spontaneous expression of human nature. Through democ- 
racy alone will come the abolition of poverty, the greatest step toward the 
state of the “normal life.” 

The statistician is of course interested in the use which Professor Parmelee 
makes of the statistical material upon which in part his argument rests. 
This material is found in the chapters on the distribution of wealth and of 
income; on the extent of poverty;on unemployment; on thesweating system; 
on child and woman labor; strikes; on the growth of population and the 
increase of wealth; on domestic and matrimonial maladjustment. 

A careful survey of the use Professor Parmelee has made of his statistical 
data gives the reader the impression that he has in no case (except one) 
drawn a more than conservative conclusion from the voluminous and highly 
heterogeneous material. He does not usually make a severe criticism of 
his sources; but he refuses to admit that Hunter’s facts prove the conten- 
tion the latter makes as to the extent of poverty. He regards this con- 
clusion of Hunter’s, however, as borne out in large part by corroboratory 
data. Professor Parmelee’s careful use in every case of the data offered 
by the specialists relieves him of the need of so rigorously critical a treat- 
ment. In other words, he accepts the data as somewhat inexact, but does 
not attempt to do what too many students (notably Hunter) have done, 
namely to make exact inductions from inexact statements of the facts. 

One point which will interest the student of the problem of immigration 
is the conclusion that Professor Parmelee comes to regarding the relation- 
ship between immigration and the birth rate of the original English-speaking 
stock of the United States. On the basis of the estimate in the census 
publication, ‘“‘A Century of Population Growth,”’ he admits the probability 
of Walker’s well-known statement in this matter, and adds that immigration 
has not increased population in the United States but may even have 
checked its growth. That is a conclusion that might be difficult to justify 
from logical consideration of his data, as well as upon other grounds. 

In conclusion, one point arises for question rather than criticism. What 
does Professor Parmelee really mean by the coming of the ‘“‘normal life” 
as the ‘‘object of social progress?’”’ ‘To define the ‘‘normal life” as the 
“spontaneous expression of human nature” is good philosophy after the 
style of Plato. It lacks the objectivity that we are demanding today. 
The definition seems to the writer less definite than the term defined. The 
lower limit of the social conditions of the normal life are by implication 
discussed in the book. We could have desired our author formally to set 
at least a theoretical norm that would give us more than a hint of the con- 
crete meaning both of “‘spontaneous development” and of ‘human nature.” 
But has this more than an academic importance? 

C. E. Grexixe. 


Western Reserve University. 
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Tables: To Facilitate the Calculation of Partial Coefficients of Correlation 
and Regression Equations. Truman Lee Kelley, Austin, Texas. Bul- 
letin of the University of Texas. 

In these tables Professor Kelley has supplied a substitute for most of the 
numerous arithmetic operations which occur in recurring fashion in the 
computation of partial coefficients of correlation. In computing the co- 
efficients of regression, the indices of higher order are obtained from those 
of the first order by means of three fundamental equations of the following 
form: 
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It is to be noted that the formula (1) for obtaining a partial coefficient of 
correlation of any order from those of the next lower order is of the same 
form for all orders. To this end Professor Kelley has computed the values 
1 ab 
of ————"_—— and 
Vi-@Vi-#& Vi-aVv1-P 
for equal differences of .01—in all, 20,000 tabular entries. Using equation 
(1), these tables facilitate the computation of successive orders of partial 
coefficients, starting with total coefficients of correlation. 





for all values of a and b from 0 to 1 
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The author gives another table for values of r, r?, 1—r?, V1—r2, a 
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.01. Using equation (2), this facilitates obtaining the standard deviation 
of any order from the standard deviation of lowest order. From these 
the regression coefficients are obtained at once by formula (3). The 
procedure involves getting first r, then o, then b. 

In his preface, the author promises the extension of the tables for smaller 
subdivisions of the variables, carried out also to further places of decimals. 
Though such tables would be more serviceable at the extreme values of the 
variables, it is questionable whether or not they would lead to more sub- 
stantial results in forming conclusions based upon the coefficients of cor- 
relation. A difference of .001, for example, in any coefficient of correlation, 
is utterly meaningless to anyone. 

In his discussion on ‘The Function of Partial Coefficients or Correlation 
and Regression Equations,” Professor Kelley illustrates the use of his 
tables by several examples in detail. He presents a very convenient 
tabular form for the practical computation of regression equations, with a 
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detailed discussion of approximations to the regression equation for cases 
involving a large number of variables. 

The presentation, however, adopts poor pedagogic procedure. The ap- 
parent implicit confidence in the value of these partial coefficients of cor- 
relation, though quite orthodox, seems calculated to create in the mind of 
the student of statistics a blind faith in the subtle power of this form of 
analysis. This would tend to make the use of coefficients of correlation 
simply mechanical, and to that extent eliminate the supreme advantage of 
naive statistical thought. One would fail to realize that the regression 
equation is not the final word in any statistical analysis. At best, it is but 
a convenient approximate representation of the relation between a variable 
and other variables which partially determine it. Though the tables of 
themselves are valuable, the author’s comments would tend to be misleading 
to one who is unfamiliar with the precise arbitrary nature of a regression 
equation. Moreover, regressions are sometimes used for purposes of 
interpolation for such values of the variables where the actual regression 
equation is anything but linear. This should be guarded against. Such 
comments would of course lose their force when applied to what we may 
term regression hyper-surfaces, and the associated corre’, .cion ratios. 

The author vaguely refers to V 1—r? a8 the “coefficient of independence,” 


and te : : as the “measure of importance,”’ “measure of significance,” 
-r 

etc., of variables, though r is the correlation with but one other variable. 
In the second part of a regression equation a variable may be said to be 
“important” by definition in proportion to the size ot the weights. It is 
poor. pedagogy to use verbal interpretations of loose significance for func- 
tions of precise mathematical form. The latter is at best but an arbitrary 
means for the classification of the data. 

Professor Kelley states the theorem that the determination of the re- 
gression equation on the principle of least squares is equivalent to that of 
determining a linear function of the independent variables which will yield 
values that correlate most highly with the corresponding values of the de- 
pendent variable itself. He refers the student to Yule’s “Introduction” 
for an elementary proof. On referring to that book, however, the student 
will find nothing in the nature of a proof, though he may be satisfied by a 
casual statement of the author, without proof, to that effect. 

The last sentence of the author’s remarks is somewhat cryptic. One 
would be curious to discover the intended connotation. 

Harry LANGMAN. 


New York City. 
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Reviews and Notes. 


STATE CENSUSES IN 1915. 


Ten states took a census of the population in 1915. The published re- 
turns for these censuses are not entirely satisfactory from a statistical 
standpoint, however, as pointed out in Bulletin 133, Bureau of the Census, 
1916. Methods of enumeration, tabulation and publication, and the dates 
upon which the enumerations were made, differ widely. 

In New York State, the published count of the population is believed by 
competent observers to be far short of the actual number. In New York 
City several hundred thousand persons were excluded from the census whose 
births, marriages, deaths, and sicknesses are reportable. The group of 
excluded persons were: (1) persons in federal reservations, and in navy 
yards, army posts, marine hospital stations; (2) guests of hotels; (3) inmates 
of institutions; (4) residents of the city temporarily absent; (5) per- 
sons, like day laborers, who regularly leave the city for out-of-town employ- 
ment during the summer months. At a slight additional cost in money 
and pains, the New York State Census, taken primarily for legislative 
apportionment purposes, might have been made to serve also the health 
and social agencies of the state. 

In New Jersey, no specific indictments of the state census were made. 
The general dissatisfaction with the meagre results of the enumeration was 
expressed, however, in the repeal of the enabling legislation by Senate Bill 
No. 130, chapter 34, Laws of 1916. 

The bill under which the census was taken was enacted April 12, 1905, 
for the enumeration of that year. The New Jersey 1915 Census cost very 
nearly $100,000; $85,330 was paid to enumerators and supervisors. A 
voluntary committee representing the social and philanthropic agencies of 
the state had tried to have the scope of the enumeration and tabulation 
program of the census amended so as to provide data useful in the study of 
population movements which had taken place since 1910. No action by 
the state authorities followed this committee’s recommendation, however. 

Francis A. Walker in his article on the Eleventh Census of the United 
States (Quarterly Journal of Economics, Vol. 2, 1888, pp. 135-161) suggested 
a Federal Quinquennial Census. He indicated the failure of the provision 
of the March 3, 1879, census law for aiding states to take censuses at dates 
intermediate between the United States enumerations. Section 22 of this 
law stipulated that if state enumerations were taken on a date mean be- 
tween the dates of the federal censuses, on schedules and forms conforming 
in all respects to those of the federal census, the Secretary of the Treasury 
was empowered to pay to the governors of such states a sum equal to 50 
per cent. of the amount paid enumerators and supervisors in that state at 
the time of the next preceding federal census. A subsidy of this kind for 
the New Jersey 1915 Census would have more than paid for the costs of 
the extra tabulations asked for by the social workers of the state. It is 
sincerely hoped that provision for state censuses, and for federal 
codperation in such enumerations, will be made in the Fourteenth Census 
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Bill. The following brief tabulation from Bulletin 133, Bureau of the 
Census, summarizes the state census situation in 1915: 


STATES TAKING AN INTERDECENNIAL CENSUS UNDER STATE CONTROL IN 1915. 








Date of State Census. | 


Official! in Charge of 
State. | Enumeration. 





First. Last. 





I it a cin alemalbaede 1698 June 1, 1915 Secretary of State. 

i7 Mar. 1, 1915 Supt., Bureau of the Census. 

1726 June 1, 1915 Secretary of State. 

1765 Apr. 1, 1915 Director, Bureau of Statistics. 
1838 Dec. 31, 1914 Sec’y, Executive Council. 

1855 Mar. 1, 1915 Sec'y, State Board of Agriculture 
1895 July 1, 1915 Commissioner of Agriculture. 
1895 May 1, 1915 Superintendent of Census. 

1905 Apr. 1, 1915 Secretary of State. 

1905 Apr, 5, 1915 Secretary of State. 

















E. W. Kopr. 
New York City. 


“Interest Tables for Small Loans,” by Arthur H. Ham. The Spectator 
Company, New York. 52 pp. Price $4.00. 


This small volume from the pen of Mr. Ham, Director of the Division 
of Remedial Loans of the Russell Sage Foundation, contains a vast 
amount of valuable matter in a small space. There are two sets of 
tables in the volume, the first showing the amount of interest at the 
rates of 1, 14, 2, 24, 3 and 34 per cent. per month on sums of 50 cents 
to $300 for periods of one day to thirty days. In the second place there 
are tables showing the amount of interest at these rates on loans of $10 
to $300, payable in 4, 6, 8, 10 and 12 equal monthly installments. In 
addition to the tables, there is an appendix containing the formulae for 
determining annual interest rate, amount of the interest charged, annual 
discount rate, and amount of the discount charge on loans payable in 
equal periodic installments computed on unpaid balances and not com- 
pounded. This volume should be of considerable service to lenders on 
chattel mortgages and wage assignments, to pawn-brokers, and to sniall 
money lending agencies. The book is attractively prepared and is ar- 


ranged so as to make easy a consultation of the tables. 
Ws. B. B. 
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